AnnouncementsJanuary 6, 20262 min read

Introducing Superagent Guard

Purpose-trained models that detect prompt injections, identify jailbreak attempts, and enforce guardrails at runtime. Optimized for deployment as a security layer in AI agent systems.

Alan ZabihiCo-founder & CEO
Share:

At Superagent, we help developers make their AI apps safe. That means building purpose-trained models — not tweaking system prompts, repurposing general models, or stacking regex filters.

Today we're releasing Superagent Guard: three models optimized for deployment as a security layer in AI agent systems and LLM applications.

What it does

Superagent Guard detects prompt injections, identifies jailbreak attempts, and enforces agent guardrails at runtime. It sits between your application and your model, classifying every input before it reaches the agent.

The models output structured JSON: a pass or block classification, violation types, and CWE codes. Your system acts on the decision. Your compliance team audits the trail.

Three sizes

Tiny (0.6B) — Ultra-low latency. When milliseconds matter.

Small (1.7B) — Balanced performance. Where most teams should start.

Medium (4B) — Maximum detection. When false negatives are expensive.

All three run locally, inside your VPC, with no data leaving your infrastructure.

Safe and provable

Superagent Guard handles the safe part. For proving it, pair Guard with Safety Tests to red-team your agents and Safety Page to publish your security posture publicly.

Get started

Three steps:

  1. Create an account and grab your API key
  2. Install the SDK
  3. Call guard() on any input

Full walkthrough in the quickstart guide.

Join our newsletter

We'll share announcements and content regarding AI safety.