Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The current landscape is a battle between loss-leaders. OpenAI is burning through billions of dollars per year and is expected to hit tens of billions in losses per year soon. Your $20 per month subscription to ChatGPT is nowhere near keeping them afloat. Anthropic’s figures are more moderate, but it is still currently lighting money on fire in order to compete and gain or protect market share.

I don't doubt that the leading labs are lighting money on fire. Undoubtedly, it costs crazy amounts of cash to train these models. But hardware development takes time and it's only been a few years at this point. Even TODAY, one can run Kimi K2.5, a 1T param open-source model on two mac studios. It runs at 24 tokens/sec. Yes, it'll cost you $20k for the specs needed, but that's hobbyist and small business territory... we're not talking mainframe computer costs here. And certainly this price will come down? And it's hard to imagine that the hardware won't get faster/better?

Yes... training the models can really only be done with NVIDIA and costs insane amounts of money. But it seems like even if we see just moderate improvement going forward, this is still a monumental shift for coding if you compare where we are at to 2022 (or even 2024).

[1] https://x.com/alexocheema/status/2016487974876164562?s=20



And just to add to this the reason the Apple macs are used is that they have the highest memory bandwidth of any easily obtainable consumer device right now. (Yes the nvidia cards which also have hbm are even higher on memory bandwidth but not easily obtainable). Memory bandwidth is the limiting factor for inference more so than raw compute.

Memory costs are skyrocketing right now as everyone pivots to using hbm paired with moderate processing power. This is the perfect combination for inference. The current memory situation is obviously temporary. Factories will be built and scaled and memory is not particularly power hungry, there’s a reason you don’t really need much cooling for it. As training becomes less of a focus and inference more of a focus we will at some point be moving from the highest end nvidia cards to boxes of essentially power efficient memory hbm memory attached to smaller more efficient compute in the future.

I see a lot of commentary “ai companies are so stupid buying up all the memory” around the place atm. That memory is what’s needed to run the inference cheaply. It’s currently done on nvidia cards and apple m series cpus because those two are the first to utilise High Bandwidth Memory but the raw compute of the nvidia cards is really only useful for training, they are just using them for inference right now because there’s not much pn the market that has similar memory bandwidth. But this will be changing very soon. Everyone in the industry is coming along with their own dedicated compute using hbm memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: