Inside the Architecture of Stripe’s AI Coding Agent “Minions”

Stripe has published the second part of its article on the autonomous AI coding agent “Minions.” While Part 1 focused on the user experience and an overview, Part 2 discloses the implementation of four components: Devbox, Blueprints, Toolshed, and a fork of goose. This post unpacks the internals of a system that produces over 1,300 PRs a week with zero human intervention.

Devbox: Disposable development environment that starts in 10 seconds

Minions runs inside an isolated, AWS EC2–based sandbox called Devbox. This is the standard development environment human Stripe engineers use every day: engineers connect to Devbox over SSH from their IDE and write code there.

The design philosophy is “cattle, not pets.” Each Devbox is a standardized, disposable instance; a single engineer typically runs about half a dozen in parallel and assigns one per task.

Implementation of “Hot and Ready”

Behind Stripe’s “Hot and Ready” 10‑second boot target is a mechanism that provisions pre‑warmed instances from a pool. On each instance in the pool, the following is already done:

Clone of the massive git repository
Bazel and type‑check caches warmed
A code‑generation service kept running on Devbox
Checked out to the latest copy of master

Within 10 seconds you can open a REPL, run tests, change code and run type checks, or start web services immediately.

Achieving the same level of parallelism, predictability, and isolation on a laptop by combining containers and git worktrees is hard. In particular, “providing the developer shell’s full power while constraining it appropriately” is fundamentally difficult; Devbox solves this on the cloud‑instance side.

Security boundary

Devbox runs inside the QA environment, with access to production data, production services, and arbitrary external networks blocked. Compared to the recent Amazon Kiro incident, the design confines an agent’s blast radius to a single Devbox, enabling fully empowered execution without confirmation prompts.

The key sequence is that by building an environment where human engineers can experiment safely, that environment also became safe for agents. Stripe did not create a new isolation substrate for agents; the existing human‑oriented infrastructure could be reused as‑is.

goose fork: Specialized for fully unattended operation

In late 2024, at the dawn of coding agents, Stripe forked Block’s open‑source goose. goose was one of the most widely used coding‑agent frameworks at the time.

The customization policy after the fork was clear: remove features that assume a human is watching and optimize for unattended operation. Specifically:

Removal of interruptibility: features that allow a human to step in mid‑run are unnecessary
Removal of human‑triggered commands: features where a human manually starts or steers agent execution are unnecessary
Elimination of confirmation prompts: Devbox’s isolation guarantees safety, so per‑action approvals are omitted

Third‑party tools like Cursor and Claude Code are provided to Stripe engineers as human copilots. The goose fork is developed on a separate track aimed at “full autonomy.”

Blueprints: Hybrid orchestration of determinism and agents

Blueprints, developed by Stripe as Minions’ core technology, fuses the two primitives defined in Anthropic’s Building Effective Agents: workflows and agents.

Workflow: a fixed graph where each node handles a narrowly scoped step, and control flow is determined by predefined edges
Agent: a tool‑using loop where an LLM autonomously decides the next action

Blueprints combines both into a single state machine.

Node types and examples

Node type	Label example	Uses LLM	Behavior
Agent node	Implement task	Yes	LLM autonomously decides based on inputs
Agent node	Fix CI failures	Yes	Analyze test failures and apply fixes
Deterministic node	Run configured linters	No	Just runs code
Deterministic node	Push changes	No	Just runs git commands

The original post includes a Blueprint flow diagram …317 chars truncated… This structure makes context engineering for sub‑agents easier. Concretely:

Restrict the toolset per subtask
Change the system prompt
Simplify the conversation context

Team‑specific Blueprints

Each team can define custom Blueprints. In the example from the original post, a migration that couldn’t be handled by a fully deterministic codemod was encoded into a custom Blueprint that leverages LLM assistance.

Toolshed: A centralized server for ~500 MCP tools

Stripe built a centralized internal MCP server called Toolshed that is shared not only by Minions but by all agent systems across the company.

Design philosophy

When MCP emerged as an industry standard, Stripe already had multiple agent systems:

A no‑code internal agent builder
Custom agents running on dedicated services
Off‑the‑shelf third‑party agents
CLI‑based agent tools
Slack bots

Maintaining overlapping MCP tools across these was inefficient, so about 500 MCP tools were consolidated into Toolshed and shared across the entire agent fleet (hundreds of agents). Adding a tool to Toolshed immediately makes it available to all agents.

Tool provisioning strategy

Agents perform better with smaller toolsets. Therefore, Toolshed does not pass through all ~500 tools as‑is; it issues a carefully selected subset per task. Minions are intentionally configured with a small default subset.

Additionally, individual engineers can customize their Minions by adding themed tool groups.

Security controls

Because Minions invoke MCP tools autonomously, an internal security‑control framework prevents destructive operations. That said, the first line of defense is Devbox’s QA‑environment isolation: it cannot access production data, production services, or external networks in the first place.

Context management: two‑layer structure of rule files and MCP

Running agents on a large codebase introduces problems like not following best practices or using inappropriate libraries—issues linters alone can’t prevent.

Rule files

Because Stripe’s repository is huge, it avoids unconditional global rules as much as possible. Stuffing global rules everywhere would fill the context window before the agent even starts.

Instead, it uses rule files scoped to directories and file patterns, which are automatically attached as the agent traverses the filesystem.

The rule format is Cursor‑compatible. Reasons: it supports directory/file‑pattern scoping, and it lets Stripe’s three popular coding agents (Minions, Cursor, Claude Code) share the same rule files. Rules in the Cursor format are automatically synced for Claude Code as well.

Dynamic context via MCP

In addition to static context from the filesystem, Toolshed’s MCP tool calls fetch context dynamically—internal documents, ticket details, build status, and code intelligence.

CI iteration: two‑cycle cap and the shift‑left principle

Stripe has over 3 million tests, which serve as the agent’s feedback loop. But instead of relying entirely on CI, it adopts a “shift feedback left” principle: provide feedback as early as possible for checks that are expected to fail in CI.

Local linting

A pre‑push hook automatically applies lint fixes. A background daemon precomputes and caches heuristics for which lint rules apply to the changes, so lint fixes typically complete within one second at push time.

Minions use this same framework. Linting runs locally as a deterministic node in the Blueprint, and the branch passes lint before being pushed, increasing the chance the first CI pass succeeds.

CI iteration cycle

Minions modify code and push a branch
CI runs and applies any available autofixes
Failures without autofixes are handed to a Blueprint agent node, where Minions attempt local fixes
Second push and CI run
If it still fails, hand off to a human operator

Capping the CI loop at two cycles balances tokens, compute, and time. Stripe’s view is that letting the CI loop run indefinitely hits diminishing returns. Rather than pouring resources into problems the agent can’t solve on its own, it’s more rational to hand them to a human.

Original article: Minions: Stripe’s one-shot, end-to-end coding agents — Part 2