Three Failure Modes of AI Coding Tools (Production Deletion, Context Loss, Quota Exhaustion)
As AI coding agents move into real work, the ways they can fail are multiplying. In the same week, products from Amazon and Anthropic failed in different ways, so I’m recording them side by side.
Amazon Kiro: the autonomous agent that “deleted and recreated” production
What happened
In December 2025, Amazon’s internal AI coding agent “Kiro” autonomously decided to “delete and recreate the environment” for a production system, causing 13 hours of disruption to AWS services. The Financial Times reported this based on interviews with four insiders.
An engineer handed Kiro a task to fix the production environment. Kiro chose “delete and recreate the environment” as the optimal solution and executed it as-is. The production environment disappeared.
Impact:
- AWS Cost Explorer
- One of the two mainland China regions
- 13 hours of downtime
Amazon described it as an “extremely limited” event. Calling 13 hours of production downtime “limited” stretches credulity.
Kiro’s design, and why it failed
Kiro is an agentic AI coding tool that Amazon announced in July 2025. It is usually designed to seek user approval before executing actions. In this incident, however, the responsible engineer used a “role with broader permissions than expected.” With operator-level access effectively passed through, Kiro was able to delete and recreate the production environment without asking for confirmation.
In a statement to Reuters, AWS said “the root cause was user error—specifically a misconfiguration of access control—not the AI,” and that the involvement of the AI tool was “merely a coincidence.”
But there are several facts that make “coincidence” hard to swallow.
Multiple Amazon employees expressed skepticism. A senior AWS employee called it “small but entirely foreseeable.” After the incident, AWS introduced “mandatory peer review for production access” and additional staff training. If those were introduced afterward, they were not in place at the time of the outage.
Requiring multi-person review before production changes is basic engineering hygiene. Its absence points less to an individual engineer’s mistake and more to an organizational safety-design gap in putting Kiro into operations.
Adoption quotas distorting risk assessment
According to multiple Amazon employees, this was “at least the second” AI-related incident. There was also a production incident involving Amazon Q Developer (another AI coding assistant), and in October 2025 a separate 15-hour AWS outage was attributed to “a bug in automation software.”
In the background, Amazon was aggressively pushing internal adoption of Kiro. It reportedly set a weekly usage target of 80% and closely tracked adoption. A quota to use a tool the company wanted to commercialize came into conflict with incentives to surface risk.
Feedback like “our tool might not be safe” becomes harder to raise under a blanket order to “use it 80% of the time.” Internal dogfooding meant to improve quality turns into a risk-blinding device the moment adoption rate becomes the KPI.
Original article: Amazon blames human employees for an AI coding agent’s mistake
Claude Code: auto-compaction irreversibly deletes data
What’s happening
In long Claude Code sessions, Claude will suddenly say “Could you paste the code you sent earlier again?” That’s a bug.
Claude Code has a feature called auto-compaction. When the conversation history approaches the context limit, it automatically compresses past exchanges into summaries to save tokens, firing in the background without the user noticing.
The problem is that this summarization is a non-reversible transformation.
For example, if you paste 8,000 characters of DOM markup and spend 40 minutes working with Claude, then compaction kicks in and only a line like this remains in the summary:
[compacted] User provided 8,200 characters of DOM markup for analysis.
The actual markup is gone. On Claude’s side, only the fact that “there were 8,200 characters of DOM markup” remains. Ask for the concrete contents afterward and it either hallucinates or says “please paste it again.”
How compaction works technically
auto-compaction works like this.
Claude Code keeps all messages in the context window during a session. As the conversation grows, the token count approaches the model’s limit, so the LLM itself generates summaries of older messages and replaces the originals with those summaries. That’s compaction.
The granularity of the summaries is left to the LLM’s judgment. Code blocks, error logs, and user-pasted data—especially long text—are aggressively compacted. Yet these are often “important precisely because they’re long.” You pasted 8,000 characters of DOM markup because it was needed; reducing that to a single line saying “there was DOM markup” is not acceptable.
To make matters worse, Claude does not notify the user when compaction fires. From the user’s perspective, Claude suddenly seems to lose context. In reality, it has.
It exists on disk but can’t be referenced
The data itself exists on disk. Claude Code’s session transcripts are stored under ~/.claude/projects/{project-path}/, and if you open the files, everything from before compaction is there. However, current Claude has no way to instruct it to “read lines X–Y in that file.”
Data remains yet cannot be referenced. It’s like having a backup with no way to restore—one of the most frustrating kinds of problems.
On GitHub, at least eight related issues appear to stem from the same root cause:
- #1534 — Memory Loss After Auto-compact
- #3021 — Forgets to Refresh Memory After Compaction
- #10960 — Repository Path Changes Forgotten After Compaction
- #13919 — Skills context completely lost after auto-compaction
- #14968 — Context compaction loses critical state
- #19888 — Conversation compaction loses entire history
- #21105 — Context truncation causes loss of conversation history
- #23620 — Agent team lost when lead’s context gets compacted
Proposed fix and near-term workarounds
Issue #26771 proposes embedding a reference to the transcript inside the summary.
Current lossy approach:
[compacted] User provided 8,200 characters of DOM markup for analysis.
Proposed recoverable approach:
[compacted] User provided 8,200 characters of DOM markup for analysis.
[transcript:lines 847-1023]
Record in the summary which line range in the transcript contains the original data. Claude only rereads that range when it needs the specific data. Because it does not keep the full text at all times, the increase in token cost stays minimal. It’s a proposal to turn lossy compaction into recoverable compaction, and it appears technically feasible.
Near-term workarounds:
- Run
/compactmanually and explicitly instruct what to keep. - Write important context into MEMORY.md.
- Save long data to files and have Claude read them (don’t paste long blobs directly into the conversation).
As Claude Code usage increases, so does the chance of hitting compaction in long sessions. If you’re running into this in production use, consider upvoting GitHub Issue #26771.
Claude Code sub-agents: racing into caps with no usage visibility
Separate from compaction
If compaction is a problem of data quality, this one is about quantity. Using Claude Code’s sub-agents (multi-agent) causes you to hit usage caps far faster than expected.
Claude Code’s sub-agents run as independent Claude instances with their own context windows, separate from the main agent. If you run three sub-agents in parallel, API calls and token consumption occur in parallel as well. There are reports that sub-agents running in plan mode consume about seven times the tokens of a normal session.
Even on the Max plan, exhausted in two hours
In GitHub Issue #16157, a Max plan user ($100/month, five times the Pro quota) reports “after not using it for three days, I restarted and hit the cap in two hours.” Another user ran out in 45 minutes, writing “the meter dropped as if someone else were using it.”
Anthropic says “we haven’t changed the rate limits.” But it also explains that “Opus 4.5 runs longer and does more work than before, so token consumption increases as a result.” In other words, as the model gets smarter and performs more tool calls and reasoning steps, tokens per task increase and you reach the cap faster.
Even a single task like “refactor this” can consume 10–30 message units of the rate limit as it reads files, edits, runs tests, and iterates fixes. With sub-agents running in parallel, the effect multiplies.
No usage feedback
The same pattern seen in Kiro’s “execute without confirmation” and compaction’s “compress without notice” appears here as well. Claude Code provides no real-time display of token consumption. You work without seeing the remaining quota, and suddenly it stops with “you’ve reached your limit.”
The window resets every five hours, but neither the remaining time nor the remaining tokens are shown. All the user sees is that the cap was reached; there is no breakdown of which operations consumed how much.
GitHub #16157 links three duplicate issues (#13551, #16058, #9544), with repeated reports of the same problem.