That seems very bad. What's the point of a new model that's worse than 4o? I guess it's cheaper in the API and a bit better at coding - but, this doesn't seem compelling.
With DeepSeek I heard OpenAI saying the plan was to move releases on models that were meaningfully better than the competition. Seems like what we're getting is the scheduled releases that are worse than the current versions.
It's quite a bit better than coding --- they hint that it can tie o1's performance for coding, which already benchmarks higher than 4o. And it's significantly cheaper, and presumably faster. I believe API costs account for the vast majority of COGS at most today's AI startups, so they would be very motivated to switch to a cheaper model that has similar performance.
Right. For large-volume requests that use reasoning this will be quite useful. I have a task that requires the LLM to convert thousands of free-text statements into SQL select statements, and o3-mini-high is able to get many of the more complicated ones that GPT-4o and Sonnet 3.5 failed at. So I will be switching this task to either o3-mini or DeepSeek-R1.
With DeepSeek I heard OpenAI saying the plan was to move releases on models that were meaningfully better than the competition. Seems like what we're getting is the scheduled releases that are worse than the current versions.