Just to add onto this point, you expect different experts to be activated for ev... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		bick_nyers 12 months ago \| parent \| context \| favorite \| on: Apple M3 Ultra Just to add onto this point, you expect different experts to be activated for every token, so not having all of the weights in fast memory can still be quite slow as you need to load/unload memory every token.

valine 12 months ago [–]

Probably better to be moving things from fast memory to faster memory than from slow disk to fast memory.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact