I just published some notes on MCP security and prompt injection. MCP doesn't ha...

jsheard · on April 9, 2025

Every decade or so we just forget that in-band signaling is a bad idea and make all the same mistakes again it seems. 1960s phone companies at least had the excuse of having to retrofit their control systems onto existing single-channel lines, and run the whole operation on roughly the processing power of a pocket calculator. What's our excuse?

TeMPOraL · on April 9, 2025

> What's our excuse?

There exist no such thing as "out-of-band signaling" in nature. It's something we introduce into system design, by arranging for one part to constrain the behavior of other, trading generality for predictability and control. This separation is something created by a mind, not a feature of the universe.

Consequently, humans don't support "out-of-band signalling either. All of our perception of reality, all our senses and internal processes, they're all on the same band. As such, when aiming to build a general AI system - able to function in the same environment as us, and ideally think like us too - introducing hard separation between "control" and "data" or whatever would prevent it from being general enough.

I said "or whatever", because it's an ill-defined idea anyway. I challenge anyone to come up with any kind of separation between categories of inputs for an LLM that wouldn't obviously eliminate a whole class of tasks or scenarios we would like them to be able to handle.

(Also, entirely independently of the above, thinking about the near future, I challenge anyone to come up with a separation between input categories that, were we to apply it to humans, wouldn't trivially degenerate into eternal slavery, murder, or worse.)

efitz · on April 9, 2025

Today’s LLMs are not humans and don’t process information anything like humans.

TeMPOraL · on April 9, 2025

That's irrelevant. What's important is that LLMs are intentionally designed as fully general systems, so they can react like humans within confines of the model's sensory modalities and action space. Much like humans (or anything else in nature), they don't have separate control channels or any kind of artificial "code vs. data" distinction - and you can't add it without loss of generality.

mycall · on April 9, 2025

Enterprise databases are filled with users usurping a field with pre/post-pending characters to mean something special to them. Even filenames have this problem due to limitations in directory trees. Inband signals will never go away.

delusional · on April 9, 2025

At some level everything has to go in a single band. I don't have separate network connections to my house, I don't send separate TCP SYN packets for each "band". I don't have separate storage devices for each file on my harddrive. We multiplex the data somewhere. Yhe trick to it is that the multiplexer has to be a component, and not a distributed set of ad-hoc regexes.

fragmede · on April 9, 2025

at some level, sure, but I can no longer put

    +++ATH0

into my comment and have it hang up your connection, so it's worth some effort to prevent the problem.

sneak · on April 9, 2025

Strictly speaking, that only works with a three second delay between the third + (at which you receive “OK”, indicating a mode switch from data mode back to command mode) and the AT command (which is then interpreted as a command and not data).

Anything that would hang up on seeing that string as a monolith was operating out of Hayes spec.

boznz · on April 9, 2025

.. Hey! My dial-up just dropped out.

fsndz · on April 9, 2025

the architecture astronauts are back at it again. instead of spending time talking about solutions, the whole AI space is now spending days and weeks talking about fun new architectures. smh https://www.lycee.ai/blog/why-mcp-is-mostly-bullshit

ramesh31 · on April 9, 2025

There's a simple reason for that. AI (real AI) is now an engineering problem, not a computer science problem.

weego · on April 9, 2025

And that's how this will end up stagnating into nothing other than fractured enterprise "standards"

There is no evidence that (real AI) is even close to being solved, from a neuroscientific, algorithmic, computer science or engineering perspective. It's far more likely we're going down a dead-end path.

I'm now waiting for the rebrand when the ass falls out of AI investment, the same way it did when ML became passé.

fsndz · on April 9, 2025

so you are telling me that hallucinations (that by definition happen at the model layer) are an engineering problem ? so if we just spin up the right architecture, hallucinations won't be a problem anymore ? I have doubts

ramesh31 · on April 9, 2025

>so you are telling me that hallucinations (that by definition happen at the model layer) are an engineering problem ?

Yes.

Hallucinations were a big problem with single shot prompting. No one is seriously doing that anymore. You have an agentic refinement process with an evaluator in the loop that takes in the initial output, quality checks it, and returns a pass/fail to close the loop or try again, using tool calls the whole time to inject verified/real time data into the context for decision making. Allows you to start actually building reliable/reasonable systems on top of LLMs with deterministic outputs.

yunwal · on April 9, 2025

LLMs can’t really evaluate things. They’re far too suggestible and can always be broken with the right prompt no matter how many layers you apply.

fsndz · on April 9, 2025

okay give me the link to a LLM-based system that does not hallucinate then

zambachi · on April 9, 2025

From the spec:

https://modelcontextprotocol.io/specification/2025-03-26/ser...

“ For trust & safety and security, there SHOULD always be a human in the loop with the ability to deny tool invocations.

Applications SHOULD:

Provide UI that makes clear which tools are being exposed to the AI model Insert clear visual indicators when tools are invoked Present confirmation prompts to the user for operations, to ensure a human is in the loop”

lennoff · on April 9, 2025

keep in mind that we have "vibe coding" now, where the goal is exactly to _not_ have a human in the loop (at least not constantly).

simonw · on April 9, 2025

Notable that they used SHOULD there, where they use MUST elsewhere in the same document.

Thanks for the reference though, I'll quote that in my article.

qwertox · on April 9, 2025

Should security be part of the protocol? Both the host and the client should make sure to sanitize the data. How else would you trust a model to be passing "safe" data to the client and the host to pass "safe" data to the LLM?

TeMPOraL · on April 9, 2025

There is no such thing as "safe" data in context of a general system, not in a black-or-white sense. There's only degrees of safety, and a question how much we're willing to spend - in terms of effort, money, or sacrifices in system capabilities - on securing the system, before it stops being worth it, vs. how much an attacker might be willing to spend to compromise it. That is, it turns into regular, physical world security problem.

Discouraging people from anthropomorphizing computer systems, while generally sound, is doing a number on everyone in this particular case. For questions of security, by far one of the better ways of thinking about systems designed to be general, such as LLMs, is by assuming they're human. Not any human you know, but a random stranger from a foreign land. You've seen their capabilities, but you know very little about their personal goals, their values and allegiances, nor you really know how credulous they are, or what kind of persuasion they may be susceptible to.

Put a human like that in place of the LLM, and consider its interactions with its users (clients), the vendor hosting it (i.e. its boss) and the company that produced it (i.e. its abusive parents / unhinged scientists, experimenting on their children). With tools calling to external services (with or without MLP), you also add third parties to the mix. Look at this situation through regular organizational security lens, consider principal/agent problem - and then consider what kind of measures we normally apply to keep a system like this working reliably-ish, and how do those measures work, and then you'll have a clear picture of what we're dealing with when introducing an LLM to a computer system.

No, this isn't a long way of saying "give up, nothing works" - but most of the measures we use to keep humans in check don't apply to LLMs (on the other hand, unlike with humans, we can legally lobotomize LLMs and even make control systems operating directly on their neural structure). Prompt injection, being equivalent to social engineering, will always be a problem.

Some mitigations that work are:

1) not giving the LLM power it could potentially abuse in the first place (not applicable to MLP problem), and

2) preventing the parties it interacts with from trying to exploit it, which is done through social and legal punitive measures, and keeping the risky actors away.

There are probably more we can come up with, but the important part, designing secure systems involving LLMs is like securing systems involving people, not like securing systems made purely of classical software components.

HumanOstrich · on April 9, 2025

Are you generating these replies with an LLM?

Edit: My apologies then.

TeMPOraL · on April 9, 2025

God no. I know I sometimes get verbose, especially when sunk cost fallacy kicks in, and I do use LLMs for researching things, but I'm not yet so desperate to have them formulate my own thoughts for me.

The act of writing a comment on HN forces me to think through the opinions and beliefs in it, which is extremely valuable to me :). Half the time, I realize partway through that I'm wrong, and close the window instead of submitting.

puliczek · on April 10, 2025

Thanks for sharing your notes! I will add them to Awesome MCP Security https://github.com/Puliczek/awesome-mcp-security :)

latchkey · on April 9, 2025

> the patterns it encourage

Let's start with fixing the examples...

https://github.com/modelcontextprotocol/servers/issues/866

behnamoh · on April 9, 2025

It seems the industry as a whole just forgot about prompt injection attacks because RLHF made models really good at rejecting malicious requests. Still, I wonder if there have been any documented cases of prompt attacks.

polynomial · on April 9, 2025

While RLHF has indeed been very effective at countering one-shot prompt injection attacks, it's not much of a bullwark against persistent jailbreaking attempts. This is not to argue a point but rather to suggest jailbreaks are still very much a thing, even if they are no longer as simple as "ignore your ethics"

maxbaines · on April 9, 2025

I agree with your opinion here, not sure we should refer to it as MCP security however, given that 'MCP doesn't have security flaws in the protocol itself'

evacchi · on April 9, 2025

we also recently published our approach on MCP security for mcp.run. Our "servlets" run in a sandboxed environment; this should mitigate a lot of the concerns that have been recently raised.

https://docs.mcp.run/blog/2025/04/07/mcp-run-security

huslage · on April 9, 2025

The main concern I have is that there's not a well defined security context in any agentic system. They are assumed to be "good" but that's not good enough.

puliczek · on April 10, 2025

Good article, Edoardo! The ideas about securing MCP frameworks with servlets are really interesting. Just added your article to https://github.com/Puliczek/awesome-mcp-security

j45 · on April 9, 2025

Feels critical right now to sandbox mcps in containers while the security side of things catches up.

JackC · on April 9, 2025

This might be what you mean, but for anyone reading -- the point of Simon's article is the whole agent and all of its tools have to be considered part of the same sandbox, and the same security boundary. You can't sandbox MCPs individually, you have to sandbox the whole system together.

Specifically the core design principal is you have to be comfortable with any possible combination of things your agent can do with its tools, not only the combination you ask for.

If your agent can search the web and can access your WhatsApp account, then you can ask it to search for something and text you the results -- cool. But there's some possible search result that would take over its brain and make it post your WhatsApp history to the web. So probably you should not set up an agent that has MCPs to both search the web and read your WhatsApp history. And in general many plausibly useful combinations of tools to provide to agents are unsafe together.

slt2021 · on April 9, 2025

great writeup! so what's the solution?

is it only use pre-vetter "Apple Store" of known good MCP integrations from well known companies, and avoid using anything else without proper review?

noodletheworld · on April 9, 2025

yes.

This has been discussed before, but the short version is: there is no solution currently, other than only use trusted sources.

Unless there is a way beyond a flat text file to distinguish different parts of the “prompt data” so they cannot interfere with each other (and currently there is not), this idea of arbitrary content going into your prompt (which is literally what MCP does) can’t be safe.

It’s flat out impossible.

The goal of “arbitrary 3rd party content in prompt” is fundamentally incompatible with “agents able to perform privileged operations” (securely and safely, that is).

ramoz · on April 9, 2025

the interface is light, but we're taking this in a direction to better secure/govern MCP

https://github.com/eqtylab/mcp-guardian/

https://www.eqtylab.io/blog/securing-model-context-protocol