What is the comparison of this versus DeepSeek in terms of good results and cost...

Synaesthesia · on Jan 31, 2025

Deepseek is the state of the art right now in terms of performance and output. It's really fast. The way it "explains" how it's thinking is remarkable.

fpgaminer · on Jan 31, 2025

DeepSeek is great because: 1) you can run the model locally, 2) the research was openly shared, and 3) the reasoning tokens are open. It is not, in my experience, state of the art. In all of my side by side comparisons thus far in real world applications between DeepSeek V3 and R1 vs 4o and o1, the latter has always performed better. OpenAI's models are also more consistent, glitching out maybe one in 10,000, whereas DeepSeek's models will glitch out 1 in 20. OpenAI models also handle edge cases better and have a better overall grasp of user intentions. I've had DeepSeek's models consistently misinterpret prompts, or confuse data in the prompts with instructions. Those are both very important things that make DeepSeek useless for real world applications. At least without finetuning them, which then requires using those huge 600B parameter models locally.

So it is by no means state of the art. Gemini Flash 2.0 also performs better than DeepSeek V3 in all my comparisons thus far. But Gemini Flash 2.0 isn't robust and reliable either.

But as a piece of research, and a cool toy to play with, I think DeepSeek is great.

Synaesthesia · on Jan 31, 2025

I watched it complete pretty complicated tasks like "write a snake game in Python" and "write Tetris in Python" successfully. And the way it did it, with showing all the internal steps, I've never seen before.

Watch here. https://www.youtube.com/watch?v=by9PUlqtJlM

Aperocky · on Jan 31, 2025

> which then requires using those huge 600B parameter models locally.

Are you running the smaller models locally? Doesn't seems unfair to compare it against 4o and o1 behind OpenAI APIs.

reissbaker · on Jan 31, 2025

Probably a good idea to wait for external benchmarks like Aider, but my guess is it'll be somewhere between DeepSeek V3 and R1 in terms of benchmarks — R1 trades blows with o1-high, and V3 is somewhat lower — but I'd expect o3-mini to be considerably faster. Despite the blog post saying paid users can access o3-mini today, I don't see it as an option yet in their UI... But IIRC when they announced o3-mini in December they claimed it would be similar to 4o in terms of overall latency, and 4o is much faster than V3/R1 currently.