Inside the Architecture of Stripe’s AI Coding Agent “Minions”
Stripe has published the second part of its article on the autonomous AI coding agent “Minions.” While Part 1 focused on the user experience and an overview, Part 2 discloses the implementation of four components: Devbox, Blueprints, Toolshed, and a fork of goose. This post unpacks the internals of a system that produces over 1,300 PRs a week with zero human intervention.
Devbox: Disposable development environment that starts in 10 seconds
Minions runs inside an isolated, AWS EC2–based sandbox called Devbox. This is the standard development environment human Stripe engineers use every day: engineers connect to Devbox over SSH from their IDE and write code there.
The design philosophy is “cattle, not pets.” Each Devbox is a standardized, disposable instance; a single engineer typically runs about half a dozen in parallel and assigns one per task.
Implementation of “Hot and Ready”
Behind Stripe’s “Hot and Ready” 10‑second boot target is a mechanism that provisions pre‑warmed instances from a pool. On each instance in the pool, the following is already done:
- Clone of the massive git repository
- Bazel and type‑check caches warmed
- A code‑generation service kept running on Devbox
- Checked out to the latest copy of master
Within 10 seconds you can open a REPL, run tests, change code and run type checks, or start web services immediately.
Achieving the same level of parallelism, predictability, and isolation on a laptop by combining containers and git worktrees is hard. In particular, “providing the developer shell’s full power while constraining it appropriately” is fundamentally difficult; Devbox solves this on the cloud‑instance side.
Security boundary
Devbox runs inside the QA environment, with access to production data, production services, and arbitrary external networks blocked. Compared to the recent Amazon Kiro incident, the design confines an agent’s blast radius to a single Devbox, enabling fully empowered execution without confirmation prompts.
The key sequence is that by building an environment where human engineers can experiment safely, that environment also became safe for agents. Stripe did not create a new isolation substrate for agents; the existing human‑oriented infrastructure could be reused as‑is.
goose fork: Specialized for fully unattended operation
In late 2024, at the dawn of coding agents, Stripe forked Block’s open‑source goose. goose was one of the most widely used coding‑agent frameworks at the time.
The customization policy after the fork was clear: remove features that assume a human is watching and optimize for unattended operation. Specifically:
- Removal of interruptibility: features that allow a human to step in mid‑run are unnecessary
- Removal of human‑triggered commands: features where a human manually starts or steers agent execution are unnecessary
- Elimination of confirmation prompts: Devbox’s isolation guarantees safety, so per‑action approvals are omitted
Third‑party tools like Cursor and Claude Code are provided to Stripe engineers as human copilots. The goose fork is developed on a separate track aimed at “full autonomy.”
Blueprints: Hybrid orchestration of determinism and agents
Blueprints, developed by Stripe as Minions’ core technology, fuses the two primitives defined in Anthropic’s Building Effective Agents: workflows and agents.
- Workflow: a fixed graph where each node handles a narrowly scoped step, and control flow is determined by predefined edges
- Agent: a tool‑using loop where an LLM autonomously decides the next action
Blueprints combines both into a single state machine.
Node types and examples
| Node type | Label example | Uses LLM | Behavior |
|---|---|---|---|
| Agent node | Implement task | Yes | LLM autonomously decides based on inputs |
| Agent node | Fix CI failures | Yes | Analyze test failures and apply fixes |
| Deterministic node | Run configured linters | No | Just runs code |
| Deterministic node | Push changes | No | Just runs git commands |
The original post includes a Blueprint flow diagram …317 chars truncated… This structure makes context engineering for sub‑agents easier. Concretely:
- Restrict the toolset per subtask
- Change the system prompt
- Simplify the conversation context
Team‑specific Blueprints
Each team can define custom Blueprints. In the example from the original post, a migration that couldn’t be handled by a fully deterministic codemod was encoded into a custom Blueprint that leverages LLM assistance.
Toolshed: A centralized server for ~500 MCP tools
Stripe built a centralized internal MCP server called Toolshed that is shared not only by Minions but by all agent systems across the company.
Design philosophy
When MCP emerged as an industry standard, Stripe already had multiple agent systems:
- A no‑code internal agent builder
- Custom agents running on dedicated services
- Off‑the‑shelf third‑party agents
- CLI‑based agent tools
- Slack bots
Maintaining overlapping MCP tools across these was inefficient, so about 500 MCP tools were consolidated into Toolshed and shared across the entire agent fleet (hundreds of agents). Adding a tool to Toolshed immediately makes it available to all agents.
Tool provisioning strategy
Agents perform better with smaller toolsets. Therefore, Toolshed does not pass through all ~500 tools as‑is; it issues a carefully selected subset per task. Minions are intentionally configured with a small default subset.
Additionally, individual engineers can customize their Minions by adding themed tool groups.
Security controls
Because Minions invoke MCP tools autonomously, an internal security‑control framework prevents destructive operations. That said, the first line of defense is Devbox’s QA‑environment isolation: it cannot access production data, production services, or external networks in the first place.
Context management: two‑layer structure of rule files and MCP
Running agents on a large codebase introduces problems like not following best practices or using inappropriate libraries—issues linters alone can’t prevent.
Rule files
Because Stripe’s repository is huge, it avoids unconditional global rules as much as possible. Stuffing global rules everywhere would fill the context window before the agent even starts.
Instead, it uses rule files scoped to directories and file patterns, which are automatically attached as the agent traverses the filesystem.
The rule format is Cursor‑compatible. Reasons: it supports directory/file‑pattern scoping, and it lets Stripe’s three popular coding agents (Minions, Cursor, Claude Code) share the same rule files. Rules in the Cursor format are automatically synced for Claude Code as well.
Dynamic context via MCP
In addition to static context from the filesystem, Toolshed’s MCP tool calls fetch context dynamically—internal documents, ticket details, build status, and code intelligence.
CI iteration: two‑cycle cap and the shift‑left principle
Stripe has over 3 million tests, which serve as the agent’s feedback loop. But instead of relying entirely on CI, it adopts a “shift feedback left” principle: provide feedback as early as possible for checks that are expected to fail in CI.
Local linting
A pre‑push hook automatically applies lint fixes. A background daemon precomputes and caches heuristics for which lint rules apply to the changes, so lint fixes typically complete within one second at push time.
Minions use this same framework. Linting runs locally as a deterministic node in the Blueprint, and the branch passes lint before being pushed, increasing the chance the first CI pass succeeds.
CI iteration cycle
- Minions modify code and push a branch
- CI runs and applies any available autofixes
- Failures without autofixes are handed to a Blueprint agent node, where Minions attempt local fixes
- Second push and CI run
- If it still fails, hand off to a human operator
Capping the CI loop at two cycles balances tokens, compute, and time. Stripe’s view is that letting the CI loop run indefinitely hits diminishing returns. Rather than pouring resources into problems the agent can’t solve on its own, it’s more rational to hand them to a human.
Original article: Minions: Stripe’s one-shot, end-to-end coding agents — Part 2