Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A chef also learns through trial and error not just reading how others have cooked in the past and then copping their motions.

This is exemplified by how altitude has a meaningful impact but isn’t discussed for a given recipe.



a text LLM isn't going to learn by trial and error, it's not been given that sort of freedom. RLHF would be the llm version of trial and error - but it's like the chef is only allowed to do that for a few days after years of chef school and from then on, he has to stick to what he has already learnt.


Why isn't LLM pre-training based on next token prediction considered "trial and error"? It seems to fit that description pretty well to me.


Pre-training is based on a proxy for desired output not actually desired output. It’s not in the form of responses to a prompt, and 1:1 reproducing copyrighted works in production would be bad.

It’s the difference between a painter copying some work and a painter making an original piece and then get feedback on it. We consider the second trial and error because the full process is being tested not just technique.


a chef doesn't get feedback on his meal after picking up the spoon. he gets feedback when he or somebody else tastes the meal part way through and at the end.


There is more than one correct answer in reality, LLM pre-training just trains it to respond the same way as the text did.

Imagine if school only gave correct if you used exactly the same words as the book, that is not "trial and error".


I can tell you haven't been in a school in while. That is actually a pretty accurate description of what schools are like nowadays.


Pretty accurate != always, which is the point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: