Introducing Superagent Guard
Purpose-trained models that detect prompt injections, identify jailbreak attempts, and enforce guardrails at runtime. Optimized for deployment as a security layer in AI agent systems.
At Superagent, we help developers make their AI apps safe. That means building purpose-trained models — not tweaking system prompts, repurposing general models, or stacking regex filters.
Today we're releasing Superagent Guard: three models optimized for deployment as a security layer in AI agent systems and LLM applications.
What it does
Superagent Guard detects prompt injections, identifies jailbreak attempts, and enforces agent guardrails at runtime. It sits between your application and your model, classifying every input before it reaches the agent.
The models output structured JSON: a pass or block classification, violation types, and CWE codes. Your system acts on the decision. Your compliance team audits the trail.
Three sizes
✅ Tiny (0.6B) — Ultra-low latency. When milliseconds matter.
✅ Small (1.7B) — Balanced performance. Where most teams should start.
✅ Medium (4B) — Maximum detection. When false negatives are expensive.
All three run locally, inside your VPC, with no data leaving your infrastructure.
Safe and provable
Superagent Guard handles the safe part. For proving it, pair Guard with Safety Tests to red-team your agents and Safety Page to publish your security posture publicly.
Get started
Three steps:
- Create an account and grab your API key
- Install the SDK
- Call
guard()on any input
Full walkthrough in the quickstart guide.