Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Context quite literally degrades performance of attention with size in non-needle-in-haystack lookups in almost every model to varying degrees. Thus to answer the question, the “waste” is making the model dumber unnecessarily in an attempt to make it smarter.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: