Even if you add hidden tokens that cannot be created from user input (filtering ...

tialaramex · 2026-04-09T15:31:51 1775748711

> we want there to be a threshold, because we don't want people (or AI) to ignore obvious emergencies

There's an SF short I can't find right now which begins with somebody failing to return their copy of "Kidnapped" by Robert Louis Stevenson, this gets handed over to some authority which could presumably fine you for overdue books and somehow a machine ends up concluding they've kidnapped someone named "Robert Louis Stevenson" who, it discovers, is in fact dead, therefore it's no longer kidnap it's a murder, and that's a capital offence.

The library member is executed before humans get around to solving the problem, and ironically that's probably the most unrealistic part of the story because the US is famously awful at speedy anything when it comes to justice, ten years rotting in solitary confinement for a non-existent crime is very believable today whereas "Executed in a month" sounds like a fantasy of efficiency.

jcalx · 2026-04-09T18:24:32 1775759072

Computers Don't Argue [0] by Gordon R. Dickson! A horrifying read in how a simple misunderstanding can spiral out of control.

[0] https://nob.cs.ucdavis.edu/classes/ecs153-2019-04/readings/c...

tialaramex · 2026-04-10T16:41:53 1775839313

That's the one, looks like I had some details muddled (it's a book club not a library, and so the fee is for the book which was in fact returned but perhaps lost in the post) but the outline and relevance here exactly correct. Thanks!

Terr_ · 2026-04-10T07:39:26 1775806766

> in between rows full of numbers, the text suddenly changes

To tweak the analogy slightly, the person would also need to be on mind-altering drugs, if we want them to be derailed the same way an LLM can be.

A healthy human would still be aware of the simultaneous different ways of interpreting the data, and and the importance of picking the right one. If they choose to interpret it as a cry for help, they're aware it's an interruption and mode-switch from what was happening before.

In contrast, with LLMs we haven't built thinking machines as much as dreaming ones. Your dream-self recovered the poster that was stuck on the elephant's tusk, oh look that's a pirate recruitment poster, now you're on a ship but can't raise the anchor because...

TeMPOraL · 2026-04-10T09:32:51 1775813571

> A healthy human would still be aware of the simultaneous different ways of interpreting the data, and and the importance of picking the right one. If they choose to interpret it as a cry for help, they're aware it's an interruption and mode-switch from what was happening before.

So would an LLM, as far as you can tell (in both cases, you'd have to ask, and both human and LLM would give you a similar justification). But even if not, the problem we're discussing applies to what you described as "healthy human" behavior.

You can't introduce a hard boundary between "system" and "user" inputs in LLMs any more than you could do with a human, for roughly the same reasons.

qsera · 2026-04-09T14:10:03 1775743803

>If you were there, what would you do?

Show it to my boss and let them decide.

kbelder · 2026-04-09T14:59:00 1775746740

HE'S THE ONE WHO TRAPPED ME HERE. MOVE FAST OR YOU'LL BE NEXT.

qsera · 2026-04-10T01:10:15 1775783415

Obviously, a real intelligent entity would consider risk/benefit analysis and act accordingly.

TeMPOraL · 2026-04-10T14:13:16 1775830396

Which is why "prompt injection" is just a flip side of intelligence in this sense. We want LLMs to be able to do risk/benefit analysis and act on it; we cry "security vulnerability" when it makes a different choice to the one we'd like it to. But you can't have the former without the possibility of the latter.