Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The exact questions are almost certainly not in the training data, since extra words are added to each puzzle, and I don't publish these along with the original words (though there's a slight chance they used my previous API requests for training).

To guard against potential training data contamination, I separately calculate the score using only the newest 100 puzzles. Grok 4 still leads.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: