>*Conclusion Building powerful and reliable AI Agents is becoming less about fin...

root_axis · 2025-06-30T22:09:59 1751321399

I am not a fan of this banal trend of superficially comparing aspects of machine learning to humans. It doesn't provide any insight and is hardly ever accurate.

furyofantares · 2025-06-30T22:57:22 1751324242

I've seen a lot of cases where, if you look at the context you're giving the model and imagine giving it to a human (just not yourself or your coworker, someone who doesn't already know what you're trying to achieve - think mechanical turk), the human would be unlikely to give the output you want.

Context is often incomplete, unclear, contradictory, or just contains too much distracting information. Those are all things that will cause an LLM to fail that can be fixed by thinking about how an unrelated human would do the job.

EricMausler · 2025-06-30T23:09:37 1751324977

Alternatively, I've gotten exactly what I wanted from an LLM by giving it information that would not be enough for a human to work with, knowing that the llm is just going to fill in the gaps anyway.

It's easy to forget that the conversation itself is what the LLM is helping to create. Humans will ignore or depriotitize extra information. They also need the extra information to get an idea of what you're looking for in a loose sense. The LLM is much more easily influenced by any extra wording you include, and loose guiding is likely to become strict guiding

furyofantares · 2025-07-01T00:09:26 1751328566

Yeah, it's definitely not a human! But it is often the case in my experience that problems in your context are quite obvious once looked at through a human lens.

Maybe not very often in a chat context, my experience is in trying to build agents.

0xdeafcafe · 2025-07-03T10:27:02 1751538422

Totally agree. We've found that a lot of "agent failures" trace back to assumptions, bad agent-decisions, or bloat buried in the context, stuff that makes perfect sense to the dev who built it when following the happy path, but can so easily fall apart in real-world scenarios.

We've been working on a way to test this more systematically by simulating full conversations with agents and surfacing the exact point where things go off the rails. Kind of like unit tests, but for context, behavior, and other ai jank.

Full disclosure, I work at the company building this, but the core library is open source, free to use, etc. https://github.com/langwatch/scenario

root_axis · 2025-07-01T04:15:07 1751343307

I don't see the usefulness of drawing a comparison to a human. "Context" in this sense is a technical term with a clear meaning. The anthropomorphization doesn't enlighten our understanding of the LLM in any way.

Of course, that comment was just one trivial example, this trope is present in every thread about LLMs. Inevitably, someone trots out a line like "well humans do the same thing" or "humans work the same way" or "humans can't do that either". It's a reflexive platitude most often deployed as a thought-terminating cliche.

furyofantares · 2025-07-01T16:02:14 1751385734

I agree with you completely about the trend which has been going on for years. And it's usually used to trivialize the vast expanse between humans and LLMs.

In this case though it's a pretty weird and hard job to create a context dynamically for a task, cobbling together prompts, tool outputs, and other LLM outputs. This is hard enough and weird enough that you can often end up failing to make text that even a human could make sense of to produce the desired output. And there is practical value to taking a context the LLM failed at and checking if you'd expect a human to succeed.

stefan_ · 2025-06-30T23:11:00 1751325060

Theres all these philosophers popping up everywhere. This is also another one of these topics that featured in peoples favorite scifi hyperfixation so all discussions inevitably get ruined with scifi fanfic (see also: room temperature superconductivity).

ModernMech · 2025-06-30T22:32:56 1751322776

I agree, however I do appreciate comparisons to other human-made systems. For example, "providing the right information and tools, in the right format, at the right time" sounds a lot like a bureaucracy, particularly because "right" is decided for you, it's left undefined, and may change at any time with no warning or recourse.

baxtr · 2025-07-01T04:24:58 1751343898

Without my note I wouldn’t have seen this comment, which is very insightful to me at least.

https://news.ycombinator.com/item?id=44429880

layer8 · 2025-07-01T01:48:54 1751334534

The difference is that humans can actively seek to acquire the necessary context by themselves. They don't have to passively sit there and wait for someone else to do the tedious work of feeding them all necessary context upfront. And we value humans who are able to proactively do that seeking by themselves, until they are satisfied that they can do a good job.

simonw · 2025-07-01T02:01:12 1751335272

> The difference is that humans can actively seek to acquire the necessary context by themselves

These days, so can LLM systems. The tool calling pattern got really good in the last six months, and one of the most common uses of that is to let LLMs search for information they need to add to their context.

o3 and o4-mini and Claude 4 all do this with web search in their user-facing apps and it's extremely effective.

The same patterns is increasingly showing up in coding agents, giving them the ability to search for relevant files or even pull in official document documentation for libraries.

mentalgear · 2025-06-30T22:28:35 1751322515

Basically, finding the right buttons to push within the constraints of the environment. Not so much different from what (SW) engineering is, only non-deterministic in the outcomes.

QuercusMax · 2025-06-30T21:27:05 1751318825

Yeah... I'm always asking my UX and product folks for mocks, requirements, acceptance criteria, sample inputs and outputs, why we care about this feature, etc.

Until we can scan your brain and figure out what you really want, it's going to be necessary to actually describe what you want built, and not just rely on vibes.

therealdrag0 · 2025-07-01T06:39:43 1751351983

Ya reminds me of social engineering. Like we’re seeing “How to Win Programming and Influence LLMs”.

fergal · 2025-07-01T04:42:05 1751344925

THis.. I was about to make a similar point; this conclusion reads like a job description for a technical lead role where they managed and define work for a team of human devs who execute implementation.

eviks · 2025-07-01T08:22:24 1751358144

Right info at the right time is not "more", and with humans it's pretty easy to overwhelm, so do the opposite - convert "more" into "wrong"

lupire · 2025-06-30T22:08:36 1751321316

Not "more" context. "Better" context.

(X-Y problem, for example.)

Davidzheng · 2025-07-01T12:20:16 1751372416

I think too much context is harmful