Introducing Superagent Guard

Purpose-trained models that detect prompt injections, identify jailbreak attempts, and enforce guardrails at runtime. Optimized for deployment as a security layer in AI agent systems.

At Superagent, we help developers make their AI apps safe. That means building purpose-trained models — not tweaking system prompts, repurposing general models, or stacking regex filters.

Today we're releasing Superagent Guard: three models optimized for deployment as a security layer in AI agent systems and LLM applications.

What it does

Superagent Guard detects prompt injections, identifies jailbreak attempts, and enforces agent guardrails at runtime. It sits between your application and your model, classifying every input before it reaches the agent.

The models output structured JSON: a pass or block classification, violation types, and CWE codes. Your system acts on the decision. Your compliance team audits the trail.

Three sizes

✅ Tiny (0.6B) — Ultra-low latency. When milliseconds matter.

✅ Small (1.7B) — Balanced performance. Where most teams should start.

✅ Medium (4B) — Maximum detection. When false negatives are expensive.

All three run locally, inside your VPC, with no data leaving your infrastructure.

Safe and provable

Superagent Guard handles the safe part. For proving it, pair Guard with Safety Tests to red-team your agents and Safety Page to publish your security posture publicly.

Get started

Three steps:

Create an account and grab your API key
Install the SDK
Call guard() on any input

Full walkthrough in the quickstart guide.

Introducing Superagent Guard

What it does

Three sizes

Safe and provable

Get started

Join our newsletter