Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am sure they cherry-picked the examples but still, wow. Having spent a considerable amount of time trying to introduce OSS models in my workflows I am fully aware of their short comings. Even frontier models would struggle with such outputs (unless you lead the way, help break down things and maybe even use sub-agents).

Very impressed with the progress. Keeps me excited about what’s to come next!



Subjectively I find Kimi is far "smarter" than the benchmarks imply, maybe because they game then less than US labs


I like Kimi too, but they definitely have some benchmark contamination: the blog post shows a substantial comparative drop in swebench verified vs open tests. I throw no shade - releasing these open weights is a service to humanity; really amazing.


My impression as well!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: