You're missing the point, it's only a testing excersize for the new model.

happyraul · 2026-02-06T08:38:22 1770367102

No, the point is that you can set up the testing exercise without using an LLM to do a simple find and replace.

kakacik · 2026-02-06T10:51:49 1770375109

Its a test. Like all tests, its more or less synthetic and focused on specific expected behavior. I am pretty far from llms now but this seems like a very good test to see how geniune this behavior actually is (or repeat it 10x with some scramble for going deeper).

inexcf · 2026-02-06T14:51:40 1770389500

This thread is about the find-and-replace, not the evaluation. Gambling on whether the first AI replaces the right spells just so the second one can try finding them is unnecessary when find-and-replace is faster, easier and works 100%.

bilekas · 2026-02-06T09:14:26 1770369266

... I'm not sure if you're trolling or if you missed the point again. The point is to test the contextual ability and correctness of the LLMs ability's to perform actions that would be hopefully guaranteed to not be in the training data.

It has nothing to do about the performance of the string replacement.

The initial "Find" is to see how well it performs actually find all the "spells" in this case, then to replace them. They using a separate context maybe, evaluate if the results are the same or are they skewed in favour of training data.