LLMs generate text based on weights in a model, and some of it happens to be correct statements about the world. Doesn't mean the rest is generated incorrectly.
You're describing a lack of errors in verification (working as designed/built, equations correct).
GP is describing an error in validation (not doing what we want / require / expect).
reply
LLMs generate text based on weights in a model, and some of it happens to be correct statements about the world. Doesn't mean the rest is generated incorrectly.