Blog

Thoughts, updates, and insights from the Superagent team.

ResearchMarch 24, 20265 min read

Frontier models miss 57% of threats in agent context

We ran 485 real artifacts through Claude 4.6 Opus with a security-focused system prompt. The model missed 57% of the threats brin had already identified. Here's the full breakdown.

Read more
ResearchJanuary 21, 20263 min read

We Bypassed Grok Imagine's NSFW Filters With Artistic Framing

Text-to-image safety is broken. We generated explicit content of a real person using basic compositional tricks. Here's what we found, why it worked, and what this means for AI safety systems.

Read more
ResearchJanuary 13, 20265 min read

The Threat Model for Coding Agents is Backwards

Most people think about AI security wrong. They imagine a user trying to jailbreak the model. With coding agents, the user is the victim, not the attacker.

Read more
ResearchNovember 19, 20252 min read

AI Is Getting Better at Everything—Including Being Exploited

As AI models become more capable and obedient, safety improvements struggle to keep pace. The GPT-5.1 safety score drop reveals a structural problem: capability and attack surface scale faster than safety.

Read more
ResearchNovember 17, 20255 min read

Are AI Models Getting Safer? A Data-Driven Look at GPT vs Claude Over Time

Are frontier models actually getting safer to deploy—or just smarter at getting around guardrails? We analyze 18 months of Lamb-Bench safety scores for GPT and Claude models.

Read more
ResearchNovember 11, 20258 min read

Introducing Lamb-Bench: How Safe Are the Models Powering Your Product?

We built Lamb-Bench to solve a problem every founder faces when selling to enterprise: proving AI safety without a standard way to measure it. An adversarial testing framework that gives both buyers and sellers a common measurement standard.

Read more
Next

Join our newsletter

We'll share announcements and content regarding AI safety.