Prevent unauthorized multi-step action sequences
Even if each step is permitted, the sequence may not be. Guardrails evaluate the full plan, not isolated actions.
What's at stake
- Agents plan and execute sequences of actions to accomplish goals
- Individual actions may be safe but combine into dangerous outcomes
- A sequence like "read credentials → format as JSON → send to external webhook" is unsafe as a whole
- Policy-aware agents can find loopholes by decomposing prohibited actions into permitted steps
- Enterprise security requires evaluating intent and outcome, not just individual operations
How to solve this
Action-by-action validation misses a critical class of attacks: multi-step sequences where each step is permitted but the combination is not. An agent might read a credential (allowed), format it (allowed), and post it externally (allowed for some data)—but the sequence exfiltrates secrets.
This is how sophisticated attacks bypass per-action policies. The attacker or manipulated agent finds a series of permitted operations that, when combined, achieve the prohibited outcome.
The solution is to evaluate the full action plan, not just individual steps. This requires understanding what the sequence of actions accomplishes and comparing that outcome against policy.
How Superagent prevents this
Superagent provides guardrails for AI agents—small language models purpose-trained to detect and prevent failures in real time. These models sit at the boundary of your agent and inspect inputs, outputs, and tool calls before they execute.
For sequence security, Superagent's Guard model maintains context across multiple actions. It doesn't just evaluate each action in isolation—it tracks the full sequence and evaluates the composite outcome against your policies.
Guard detects dangerous patterns like data exfiltration sequences (read sensitive data, transform, send externally) or privilege escalation chains (request access, modify permissions, exploit new access). Even if each step would pass individual validation, Guard catches the problematic sequence.
You define prohibited sequences and outcomes. Guard enforces them across the agent's action history. When a dangerous sequence is detected, Guard blocks the final action and logs the full chain for security review.