Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
bick_nyers
12 months ago
|
parent
|
context
|
favorite
| on:
Apple M3 Ultra
Just to add onto this point, you expect different experts to be activated for every token, so not having all of the weights in fast memory can still be quite slow as you need to load/unload memory every token.
valine
12 months ago
[–]
Probably better to be moving things from fast memory to faster memory than from slow disk to fast memory.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: