I've been prototyping using LLMs for some borderline use cases, and the cost isn...

cootsnuck · 2026-01-06T16:57:51 1767718671

Problem is that the frontier models are nowhere near 99% reliable. Orchestration and good system design is how you get reliability. Yes, the frontier models still are going to be better by default than open source models. But the LLM is still only a component in a broader system. What's seeming to be actually necessary for any high-usage worthwhile use case is making your model task specific (via fine-tuning / post-training / RL). I build these systems for enterprises. The frontier models are not enough.

materielle · 2026-01-06T16:50:30 1767718230

Is that really true, though?

First off, you’re ignoring error bars. On average, frontier models might be 99.95% accurate. But for many work streams, there are surely tail cases where a series of questions only produce 99% accuracy (or even less), even in the frontier model case.

The challenge that businesses face is how to integrate these fallible models into reliable and repeatable business processes. That doesn’t sound so different than software engineering of yesteryear.

I suspect that as AI hype continues to level-off, business leaders will come to their senses and realize that it’s more marginally productive to spend on integration practices than squeaking out minor gains on frontier models.