I am sure they cherry-picked the examples but still, wow. Having spent a conside...

nylonstrung · 2025-11-06T18:40:44 1762454444

Subjectively I find Kimi is far "smarter" than the benchmarks imply, maybe because they game then less than US labs

vessenes · 2025-11-06T19:43:06 1762458186

I like Kimi too, but they definitely have some benchmark contamination: the blog post shows a substantial comparative drop in swebench verified vs open tests. I throw no shade - releasing these open weights is a service to humanity; really amazing.

rubymamis · 2025-11-06T19:22:37 1762456957

My impression as well!