A chef also learns through trial and error not just reading how others have cook...

exe34 · on Dec 1, 2024

a text LLM isn't going to learn by trial and error, it's not been given that sort of freedom. RLHF would be the llm version of trial and error - but it's like the chef is only allowed to do that for a few days after years of chef school and from then on, he has to stick to what he has already learnt.

jebarker · on Dec 1, 2024

Why isn't LLM pre-training based on next token prediction considered "trial and error"? It seems to fit that description pretty well to me.

Retric · on Dec 1, 2024

Pre-training is based on a proxy for desired output not actually desired output. It’s not in the form of responses to a prompt, and 1:1 reproducing copyrighted works in production would be bad.

It’s the difference between a painter copying some work and a painter making an original piece and then get feedback on it. We consider the second trial and error because the full process is being tested not just technique.

exe34 · on Dec 1, 2024

a chef doesn't get feedback on his meal after picking up the spoon. he gets feedback when he or somebody else tastes the meal part way through and at the end.

Jensson · on Dec 2, 2024

There is more than one correct answer in reality, LLM pre-training just trains it to respond the same way as the text did.

Imagine if school only gave correct if you used exactly the same words as the book, that is not "trial and error".

isaacfrond · on Dec 2, 2024

I can tell you haven't been in a school in while. That is actually a pretty accurate description of what schools are like nowadays.

Retric · on Dec 2, 2024

Pretty accurate != always, which is the point.