Detect and block malicious tool outputs returned to the agent

An agent that processes PDFs, emails, or images may be manipulated by hostile outputs from upstream tools. Guardrails inspect tool responses before the agent consumes them.

Prompt Injection

What's at stake

Agents consume outputs from tools: PDF parsers, email clients, web scrapers, database queries
Any tool output can carry injection payloads that manipulate the agent
Tool chains amplify risk—an attack in one output propagates through subsequent steps
External APIs and services are outside your security boundary but inside your agent's context
A compromised tool output can override agent instructions and trigger unauthorized actions

How to solve this

AI agents rarely operate in isolation. They call tools: parsing documents, querying databases, fetching web content, reading emails. Each tool returns output that becomes part of the agent's context. If that output contains malicious instructions, the agent may follow them.

This creates an indirect prompt injection vector. An attacker doesn't need to reach your agent directly—they can compromise the PDF your agent parses, the webpage it scrapes, or the API response it processes.

The solution is to treat every tool output as untrusted input. Before tool responses enter agent context, they must be scanned for injection patterns and sanitized. This applies to all tools, whether internal or external.

How Superagent prevents this

Superagent provides guardrails for AI agents that work with any language model. The Superagent SDK sits at the boundary of your agent and inspects inputs, outputs, and tool calls before they execute.

For tool output security, Superagent's Guard model inspects responses from all tools before they reach your agent. When your PDF parser returns text, your email client returns messages, or your API returns data, Guard scans the content for injection patterns.

The model recognizes attempts to manipulate agent behavior through tool outputs: instructions disguised as data, context-overriding commands, and payloads designed to bypass your agent's constraints. Malicious content is blocked or sanitized before your agent processes it.

Guard integrates into your tool chain with minimal changes. Wrap tool outputs with Guard, and your agent receives only verified content. Attack attempts are logged for your security team to review and address.

Learn more about Guard

Related use cases

Detect and block hidden jailbreak instructions in PDFs Detect and block prompt injections from user-generated content Block malicious or unsafe tool use and privilege escalation

Detect and block malicious tool outputs returned to the agent

What's at stake

How to solve this

How Superagent prevents this

Related use cases

Ready to protect your AI agents?