Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> They're not paying me to use it.

Of course they are.

> As long as the inference is not done at a loss.

If making money on inference alone was possible, there would be a dozen different smaller providers who'd be taking the open weights models and offering that as service. But it seems that every provider is anchored at $20/month, so you can bet that none of them can go any lower.





> If making money on inference alone was possible, there would be a dozen different smaller providers who'd be taking the open weights models and offering that as service.

There are! Look through the provider list for some open model on https://openrouter.ai . For instance, DeepSeek 3.1 has a dozen providers. It would not make any sense to offer those below cost because you have neither moat nor branding.


> If making money on inference alone was possible

Maybe, but arguably a major reason you can't make money on inference right now is that the useful life of models is too short, so you can't amortize the development costs across much time because there is so much investment in the field that everyone is developing new models (shortening useful life in a competitive market) and everyone is simultaneously driving up the costs of inputs needed for developing models (increasing the costs that have to be amortized over the short useful life). Perversely, the AI bubble popping and resolving those issues may make profitability much easier for the survivors that have strong revenue streams.


You need a certain level of batch parallelism to make inference efficient, but you also need enough capacity to handle request floods. Being a small provider is not easy.

The open models suck. AWS hosts them for less than closed models cost but no ones uses them, because they suck.

It's not the open models that suck, it's the infrastructure around them. None of current "open weights providers" have:

   - good tools for agentic workflows
   - no tools for context management
   - infrastructure for input token caching
These are solvable without having to pay anything to OpenAI/Anthropic/Google.

Why would the open weights providers need their own tools for agentic workflows when you can just plug their OpenAI-compatible API URL into existing tools?

Also, there are many providers of open source models with caching (Moonshot AI, Groq, DeepSeek, FireWorks AI, MiniMax): https://openrouter.ai/docs/guides/best-practices/prompt-cach...


> when you can just plug their OpenAI-compatible API URL into existing tools?

Only the self-hosting diehards will bother with that. Those that want to compete with Claude Code, Gemini CLI, Codex et caterva will have to provide the whole package and do it a price point that is competitive even with low volumes - which is hard to do because the big LLM providers are all subsidizing their offerings.


They do make money on inference.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: