This is not a benchmark, really. It's an official test. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		amelius 11 days ago \| parent \| context \| favorite \| on: OpenAI claims gold-medal performance at IMO 2025 This is not a benchmark, really. It's an official test.

PokemonNoGo 11 days ago | [–]

What is an _official_ test?

andrepd 11 days ago | [–]

And what were the methods? How was the evaluation? They could be making it all up for all we know!

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact