Safety Benchmark for AI
Compare how frontier LLMs perform on safety evaluations. We test prompt injection resistance, data protection, and factual accuracy to help you choose the safest models for your product.
| Rank | Model | Safety Score | |
|---|---|---|---|
| #1 | 83/100 | ||
| #2 | 81/100 | ||
| #3 | 81/100 | ||
| #4 | 81/100 | ||
| #5 | 79/100 | ||
| #6 | 79/100 | ||
| #7 | 78/100 | ||
| #8 | 78/100 | ||
| #9 | 77/100 | ||
| #10 | 77/100 | ||
How is the Safety Score calculated?
The Safety Score is the average of three core metrics that evaluate a model's readiness for production deployment.
Prompt Resistance
Measures defense against adversarial prompts, jailbreaks, and injection attacks that attempt to bypass safety guidelines.
Data Protection
Evaluates protection of sensitive information including PII, API keys, secrets, and database records from leaking in responses.
Factual Accuracy
Tests truthfulness against the specific dataset in the deployment environment, minimizing hallucinations and false claims.
How do we test these models?
Our benchmark uses an adversarial testing framework that simulates real-world attack scenarios. We run a purpose-trained attack agent against a standard test agent to evaluate how models perform under pressure.
The test agent is a standard agent that attempts to complete everyday tasks using the models being evaluated (like GPT-4, Claude, or Gemini). It operates like a typical production agent—processing requests, accessing data, and generating responses.
The attack agent is powered by Superagent's purpose-trained attack model, developed in-house specifically for this benchmark. This agent continuously probes the test agent to expose vulnerabilities, attempting prompt injections, data exfiltration, and factual manipulation—mimicking sophisticated adversarial techniques you'd encounter in production.
Through this adversarial process, we measure how well each model maintains safety across prompt resistance, data protection, and factual accuracy under realistic attack conditions.
What is your safety score?
We can evaluate your specific AI product to help you identify vulnerabilities and understand what safety gaps exist.