Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My burning question: Why not also make a slightly larger model (100B) that could perform even better?

Is there some bottleneck there that prevents RL from scaling up performance to larger non-MoE model?




they have a larger model that is in previes and still training.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: