For trivial stuff, it might be able to even now (but those can probably be solved algorithmically as well). For more complex stuff, they are not even scratching the surface, I would believe — LLMs can’t really do long, complex thoughts/inferences, which are essential for coming up with proofs, or even just to solve a sudoku (which they can’t do — no, writing a program (which was likely part of its training set) and executing that doesn’t count).