Backlog as a state machine for AI coding agents: git worktrees, patch merge

While handling issues on Backlog for another project, I noticed that mapping Backlog’s statuses directly onto agent states makes the workflow easy to build.
Triage, local check, orchestration of In progress issues, parallel implementation, and patch integration all run through scripts and agents now. The only places a human still steps in are three: moving an issue to In progress, giving a local OK on the merged diff, and shipping to production.
This post is an operational note on where I hand work to scripts and agents, and where I keep human gates.

This wasn’t designed as a finished system from the start.
It began as a small automation — handing the issue body to Codex CLI and having it write a confirmation comment — and grew by adding prohibitions and output formats every time I caught a misreading.

I happen to build this on Codex CLI, but none of it is Codex-specific.
What it does is read the issue body, comments, attachments, the local-check memo, and candidate files together, then make a judgment, so any agent with a large context window can take its place. Claude Code should be able to play the same role if swapped in.
Read every later mention of “Codex” in that sense; the design isn’t tied to a particular CLI.

Use Backlog as a state machine, not just a ticket bin

I treat Backlog’s statuses (Open, Spec/log review, In progress) as the states of the pipeline itself.
Transitions split into ones AI triggers and ones a human triggers.
Most transitions are left to scripts and agents; the only ones a human triggers are the start approval into In progress, the local OK on a merged diff, and shipping to production — three in total.

flowchart TD
  A[Open] -->|AI triage| B[Spec/log review]
  B -->|AI local-check note| C{Human moves to In progress}
  C --> D[In progress]
  D -->|Orchestrator running| E[Parallel impl in detached worktrees]
  E -->|Submit patch| F[Lead Codex merges and verifies serially]
  F --> G{Human OKs locally}
  G --> H{Human ships to prod}

The only things a human touches are the three diamonds in the diagram: the start approval to move to In progress, the local OK on the merged diff, and shipping to production. Every other transition is driven by scripts and agents.
But first, why run this on Backlog rather than GitHub Issues? Let me cover that premise.

Why Backlog instead of GitHub Issues

This repository has a slightly unusual setup.
It isn’t a clone running as a separate per-site system; instead, within a single repository, branches are cut and each site’s UI is dropped into its own branch. A branch is a site.
So I can’t casually cut a working branch. Cutting a branch ends up meaning the same thing as adding another site’s UI.

GitHub Issues are naturally used together with the flow of cutting a branch from an issue and opening a PR.
But in this repository a branch is a site, so that flow doesn’t map onto it. You can’t cleanly associate issues with branches, and it just gets confusing.

There’s also the matter of visibility.
Because it’s a private repository, only people with repo access can see or file issues. Not just anyone can add one.

Backlog, by contrast, has stakeholders joined per project.
Anyone can file an issue, and anyone on the project can see what’s been filed. As an intake point, it’s open wider.
That’s the kind of reason I’m trying the Backlog setup. If it stops working out, I can just switch to a different container, so this is the shape for now.

Below, I go through each stage — what’s automated and what stays at the three gates.

Triage Open issues and send them to Spec/log review

This part is already implemented.
A script fetches Open issues via the Backlog API and hands the issue body, existing comments, attachment info, and image attachments to Codex CLI, which generates the JSON for a confirmation comment to post to Backlog.

The generated comment has a fixed leading marker so it’s distinguishable from human comments.

Note: this is an automated AI assessment. Please point out any misreadings.

Once that comment is written, the script moves the issue to Spec/log review.
For a freshly received issue, the AI reads it first, leaves a “here’s how I understood it,” and advances the status by one — that’s the first-response automation.

Issues that already have an AI triage comment are excluded, so the same assessment isn’t written twice.

The scheduled run is handled by a macOS LaunchAgent.
It runs every 15 minutes from 07:00 to 21:00, so it simply doesn’t fire during the late-night hours when no issues move.

Stop at Spec/log review and do read-only local inspection

Also already implemented.
For issues in Spec/log review, I added a stage that does only read-only inspection of the local repository.
The goal here is preparation for implementation, not implementation itself. So the prohibitions are spelled out.

No file edits
No running tests
No builds
No deploys
No access to production
No DB operations
No network access

Some paths are excluded from reading, too.
Log directories and the archive where past work history is accumulated aren’t read.
Those are large, and mixing stale context into the current judgment blurs the reading.

The local-check comment gets its own marker.

Note: AI local-check memo. Implementation has not started.

Even on an issue where a local check was already written, a human comment can come in afterward.
Only when a human comment appears after the latest AI local-check comment does it add a re-check comment, taking that into account.
This gets a separate marker too, so the first check and the re-check aren’t mixed up.

Note: AI local re-check memo. Implementation has not started.

While only AI comments are piling up, it doesn’t add a re-check. Repeating the same reading with no new human input is pointless.
And this stage doesn’t move the status. The local check is purely to raise reading accuracy; it isn’t a signal to start.
For now, the moment a human moves the issue to In progress on Backlog is kept as the gate to start implementation.

Use the local-check comment as a digest of the issue

The local-check output isn’t free text; it’s structured.
At the top it has comment, confidence, and local_check, and local_check is broken down further.

comment      … body text posted to Backlog
confidence   … confidence in the reading
local_check
  candidate_files        … files likely to be touched
  implementation_scope   … estimated scope of the change
  estimate               … size (small / medium / large)
  blockers               … open questions to resolve before starting
  can_start_from_local   … whether local info alone is enough to start

This way, the comment left on Backlog becomes a digest of the issue.
A human reads it after the AI has already organized it: “this issue touches this set of files, the scope is about this big, the unresolved part is here.”
Instead of articulating a vague instruction from scratch, the human only has to fix the mistakes in the AI’s reading.
Because candidate files and open questions are laid out up front, the cost of giving instructions drops.

Conditions under which it could auto-advance to In progress

This is the “there’s room, but not enabled yet” part.
Once the structured output is in place, a mechanical decision to advance to In progress is writable. The conditions look like this.

can_start_from_local is true
blockers is empty
estimate is small or medium
No DB changes
No production data operations
No overlap with an existing work-start lock and its candidate files
No pending spec decisions in the issue body or comments

If all of these hold, there’s room to move it to In progress automatically.
For now, though, this auto-transition is off.
The moment something hits In progress, the orchestrator described below starts running the implementation, so I keep a human pulling that one trigger. The transition to In progress is left as the “OK to start” signal.

Orchestration after In progress

This part is running too.
Only issues that have moved to In progress become targets of the implementation orchestrator.
The orchestrator runs at roughly hourly granularity: it reads Backlog, checks the AI local-check memo, candidate files, open questions, and existing work-start locks, then assigns work.
On top of that, it decides whether work can proceed in parallel via conflict detection, and only hands non-conflicting issues to agents.

Work-start locks and the done signal

So that multiple agents don’t pick up the same issue twice, the assigned agent posts a work-start lock comment to Backlog before starting.

Note: AI work-start memo.
issue: PROJ-123
agent: codex-1
base: <current branch> @ <short HEAD hash>
workdir: detached worktree or a separate clone path
files: path/to/fileA, path/to/fileB
integrator: main-codex

When done, it leaves the result and the next action.

Note: AI work-done memo.
issue: PROJ-123
result: patch submitted
remaining: waiting for main-codex to apply, verify, and commit

It uses Backlog’s comment thread as a bulletin board for locking and handoff between agents.

Conditions for parallelizing and what counts as a conflict

To parallelize, the premise is that multiple agents never touch the same working tree directly.
On top of that, conflict detection narrows down which issues can run in parallel.

A conflict isn’t only the case where filenames match.
Anything hitting one of these counts as a conflict.

The same screen, the same admin view, the same layout
The same API’s controller / service / repository / model
DB migrations, schema, seeds, config tables
Shared config files, dependency definitions, lockfiles
Shared components, shared CSS, shared utilities
Cross-cutting domains like auth, registration, points, payments, notifications

Parallelization is allowed only when it’s clearly a different screen, a different API, no DB changes, and no shared files.
If it touches a cross-cutting domain at all, it’s run serially instead of in parallel.

Split work with detached worktrees instead of branches

As noted earlier, in this repository a branch is a site, so I can’t cut a working branch.
To implement in parallel, the working areas have to be separated by something other than branches.

Instead, I make a detached worktree — or a separate clone — from a pinned HEAD of the current branch.
Each agent works there and returns a patch / diff rather than a commit.
The output isn’t placed on a branch; it’s handed over as a lump of diff.

A main Codex merges patches serially

The collected patches are applied to the current branch only by the main Codex.
It applies them serially while checking whether they apply (the equivalent of git apply --check), and verifies locally — the main Codex handles up to that point.
The role of dropping the parallel-implemented diffs into one coherent tree is concentrated in a single place.

Here one human gate comes in. A human looks at the merged, verified local diff, and only after giving an OK does it proceed to commit / push.
Even for an issue thrown over carelessly as “just do this,” by the time it reaches here the diff and the local-check memo are both in hand, so it’s clear what to look at.

Production and DB stay behind a separate gate

Even as automation expands, this stays behind a separate gate.
The following operations are handled separately from local reading and implementation prep, behind a gate distinct from code-change automation.

Production deploys
DB changes
Operations that add, delete, or update existing data
Destructive git operations
Bulk overwrites of production directories
rsync or sync-with-delete
Bulk operations on specific directories like public assets or storage

Put the other way around, everything from code changes up to commit / push — with a human local OK in between — can sit on the automated side.
Beyond that, only the main Codex does the production deploy, and only after explicit permission, production-deploy rules, and a non-destructive pre-check are all in place.
A diff applying and passing verification locally, and it being OK to ship to production, are separate judgments.
I’m building it on the premise of keeping that line all the way through: automate up to local, but a human holds production deploys and existing-data operations.