Nope https://x.com/polynoamial/status/1946478249187377206?s=46&t=eTe6EMTslOxUfPD...

AIPedant · 2025-07-19T12:36:28 1752928588

If you don't have a Twitter account then x.com links are useless, use a mirror: https://xcancel.com/polynoamial/status/1946478249187377206

Anyway, that doesn't refute my point, it's just PR from a weaselly and dishonest company. I didn't say it was "IMO-specific" but the output strongly suggests specialized tooling and training, and they said this was an experimental LLM that wouldn't be released. I strongly suspect they basically attached their version of AlphaProof to ChatGPT.

Davidzheng · 2025-07-19T13:00:09 1752930009

We can only go off their word unfortunately and they say no formal math. so I assume it's being eval'd by a verifier model instead of a formal system. There's actually some hints of this b/c geometry in Lean is not that well developed so unless they also built their own system it's hard to do it formally (though their P2 proof is by coordinate bash (computation by algebra instead of geometric construction) so it's hard to tell.

skdixhxbsb · 2025-07-19T13:45:15 1752932715

> We can only go off their word

We’re talking about Sam Altman’s company here. The same company that started out as a non profit claiming they wanted to better the world.

Suggesting they should be given the benefit of the doubt is dishonest at this point.

aluminum96 · 2025-07-19T17:12:34 1752945154

“they must be lying because I personally dislike them”

This is why HN threads about AI have become exhausting to read

nosianu · 2025-07-19T18:36:10 1752950170

In general I agree with you, but I see the point of requiring proof for statements made by them, instead of accepting them at face value. In those cases, given previous experiences and considering that they benefit from making them, if they are believed, the burden of proof should be on those making these statements, not on those questioning them, no?

Those models seem to be special and not part of their normal product line, as is pointed out in the comments here. I would assume that in that case they indeed had the purpose of passing these tests in mind when creating them. Or was it created for something different, and completely by chance they discovered they could be used for the challenge, unintentionally?

otabdeveloper4 · 2025-07-19T18:30:17 1752949817

Yeah, that's how the concept of "reputation" works.

queenkjuul · 2025-07-19T23:48:35 1752968915

No, they are likely lying, because they have huge incentives to lie

dandanua · 2025-07-19T20:00:03 1752955203

You don't need specialized tooling like Lean if you have enough training data with statements written in the natural language, I suppose. But the use of AlphaProof/AlphaGeometry type of learning is almost certain. And I'm sure they have spent a lot of compute to produce solutions, $10k is not a problem for them.

The bigger question is - why should everyone be excited by this? If they don't plan to share anything related to this AI model back to humanity.