Chatbot arena leaderboard is a good test for vibes and style of response, but not much else. R1's performance in objective benchmarks (coding, etc.) showed very good performance, granted, but inferior to the full o1 and o1-pro models.
It's still a very impressive feat, but it wasn't frontier-pushing.
It's still a very impressive feat, but it wasn't frontier-pushing.