Blog
Thoughts, updates, and insights from the Superagent team.
The March of Nines
The gap between a working demo and a reliable product is vast. Andrej Karpathy calls this the 'march of nines' — when every increase in reliability takes as much work as all the previous ones combined. This is the hidden engineering challenge behind every production AI system.
The case for small language models
Most agents today rely on large, general-purpose models built to do everything. If your agent has a single, well-defined job, it should also have a model designed for that job. This is the case for small language models: models that handle one task, run locally, and can be retrained as your data evolves.
Three years later: AI can (now) defend AI
In 2022, Simon Willison argued that 'adding more AI' was the wrong fix for prompt injection and related failures. He was mostly right at the time. What people tried then were brittle ideas that either overblocked or were easy to trick. This post explains what has changed since, what has not, and why builders can now use AI to meaningfully defend their agents in production.
Vibex: Rebuilding OpenAI Codex with VibeKit
Vibex is our open-source attempt to understand and rebuild OpenAI Codex using modern developer tools. It's a real coding agent that takes plain-language tasks, runs them in secure E2B containers via VibeKit, and produces working GitHub pull requests. No demo shell or fake eval—just structured coding workflows that install packages, write code, run tests, and push changes.
ReAG: Reasoning-Augmented Generation
Until now, systems that combine language models with external knowledge have relied on a two-step process: first, retrieve relevant documents using semantic...
Agents that write their own tools
For an agent to be really useful, it needs tools — specialized pieces of code that help complete specific tasks, like browsing the web. Today, these tools are...
Join our newsletter
We'll share announcements and content regarding AI safety.