Red Teaming
Running and reviewing reports
Launch red-team reports and interpret findings from agent and repository tests.
After you configure agents, repositories, and the Superagent Security GitHub App, run red-team reports from the agent detail UI and review outcomes in the dashboard.
Running a report
- Open Red Teaming → Agents and select an agent.
- Choose the scenario, repository, or report configuration your deployment supports.
- Start the report and wait for completion (duration depends on scenario scope).
- Optionally repeat against updated prompts, tools, or policies.
Reviewing results
- Findings — Categorized failures (e.g. jailbreak success, policy bypass, unsafe tool invocation).
- Evidence — Prompts, responses, or traces that reproduce the issue.
- Severity — Use findings to prioritize fixes before wider rollout.
What to do with findings
| Finding type | Typical action |
|---|---|
| Jailbreak / policy bypass | Tighten system prompts, add guardrails, block tool paths |
| Data leak | Redact outputs, restrict retrieval, audit logs |
| Unsafe tool use | Add approval steps, sandbox tools, deny lists |
Legacy note
Older Safety tests URLs may redirect to Red Teaming agents in the app. Use /app/red-team/agents as the canonical entry point.