Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just to add onto this point, you expect different experts to be activated for every token, so not having all of the weights in fast memory can still be quite slow as you need to load/unload memory every token.


Probably better to be moving things from fast memory to faster memory than from slow disk to fast memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: