Tech 11 min read

Codex's SQLite logs wrote 37TB to SSD in 21 days: cause and the June 22 fix

IkesanContents

The claim that Codex quietly wears down your SSD by writing logs in the background wasn’t an exaggeration.
According to GitHub issue openai/codex #28224, ~/.codex/logs_2.sqlite and its WAL file became the main source of continuous writes, putting about 37TB onto the main SSD over roughly 21 days of uptime.

By simple extrapolation that’s around 640TB a year.
Some consumer 1TB-class NVMe SSDs are rated at around 600TBW, so for near-always-on usage it isn’t a number you can ignore.

TBW (Total Bytes Written) is the upper limit of cumulative writes a vendor guarantees.
An SSD’s NAND flash has a finite number of program/erase cycles, and TBW expresses that lifespan as a user-facing byte count; together with the warranty period it draws the line of “we guarantee up to this many bytes.”
640TB a year is like rewriting a full 1TB drive 640 times; a 600TBW drive would burn through its warranted write endurance in under a year.
Going past TBW doesn’t mean instant failure, but you fall out of warranty, and lifespan gets hard to predict from there on.

In the same Codex area, the earlier piece I wrote, Hit “Selected model is at capacity”? Keep going first, was about stalling on the server-side processing quota.
This time it’s persistent logs on the local machine. Even when the on-screen responses look normal, writes to SQLite keep going underneath.

logs_2.sqlite and the WAL keep growing

The issue names these three files.

~/.codex/logs_2.sqlite
~/.codex/logs_2.sqlite-wal
~/.codex/logs_2.sqlite-shm

SQLite’s WAL stands for Write-Ahead Logging: it appends each update to a separate file first, then reflects it into the main database.
It keeps the main database consistent and lets other processes keep reading even during writes.
The three files each play a different role.

FileRole
logs_2.sqliteMain database. Holds committed pages
logs_2.sqlite-walWAL file. Buffers committed changes by appending before they reach the main DB
logs_2.sqlite-shmShared-memory index. Lets multiple processes track which WAL frame is newest

Writes are appended to the WAL first, and under certain conditions a checkpoint runs and copies the WAL contents back into the main DB.
There are several checkpoint triggers, but by default SQLite attempts an automatic checkpoint once the WAL has accumulated about 1,000 pages (roughly 4MiB at the default 4KiB page size).
So when you write at high frequency like a log, the cycle “append to WAL → checkpoint around 4MiB → write back to the main DB → rewind and reuse the WAL” keeps spinning.

What inflates the physical writes here is write amplification.
On the SQLite side, even a single-row INSERT rewrites the table’s pages and the index pages, writes those as WAL frames, and writes them into the main DB again at checkpoint time. One logically tiny row turns into several physical writes.
Amplification also happens on the SSD side. An SSD writes in pages (a few to a dozen-plus KiB), not in bytes, and can only erase in much larger blocks. Even a small update makes the controller rewrite the original data into another block behind the scenes, for block reclamation (garbage collection) and wear leveling. More data ends up written to NAND flash than the app wrote, and that ratio is called the WAF (Write Amplification Factor).
Because the amplification at the app layer (SQLite’s WAL and checkpoint) and the device layer (the SSD controller) stack up, the DB size you see with du and the amount the SSD actually wrote diverge.

In #28224, the retained rows numbered about 500,000 (506,149 in one snapshot, 681,774 in another), yet the AUTOINCREMENT id was over 5.5 billion (5,543,677,486).
The gap between retained rows and ids assigned so far is roughly 10,000x.
Even if the leftover log volume looks like about 1GB, the rows inserted and then deleted in the past are orders of magnitude more.
So “it’s only 1GB right now, so it’s fine” doesn’t hold.

An earlier issue, #17320, reports writing about 5MiB/s to ~/.codex/logs_2.sqlite-wal during streaming, peaking at around 16MiB/s in observations.
It also points out that even though the process environment had RUST_LOG=warn set, TRACE-level rows were still landing in SQLite.

TRACE logs made up most of it

In the #28224 breakdown, TRACE accounted for 70.7% of the estimated size of the retained log bodies.
At the top were codex_api::endpoint::responses_websocket, codex_otel.log_only, codex_otel.trace_safe, log, and codex_client::transport.
Low-level WebSocket and SSE events, OpenTelemetry mirror events, and logs from dependency libraries were flowing into SQLite.

Even if you cap how many logs are retained, the writes themselves don’t drop.
Insert a new row. Delete an old row. Append to the WAL. Checkpoint when the conditions line up.
This insert-and-delete loop wears the SSD down more than the apparent DB size suggests.

Codex’s log table has a fixed retention cap, and once it’s exceeded the oldest rows are dropped.
So even with a steady row count, the “assigned id = cumulative rows inserted so far” keeps climbing underneath.
In a 15-second sample in #28224, the retained rows held at 681,774 while max(id) rose by about 36,211. That works out to inserting roughly 2,400 rows per second and deleting the same number.
Even with DB size flat, the writes to the SSD underneath never stop.

Here’s the diagram. One row turns into many stages of physical writes.

flowchart TD
    A[WebSocket event fires] --> B[Generate TRACE log row]
    B --> C[INSERT into logs table]
    C --> D[Append frame to WAL]
    C --> E[Update index]
    E --> D
    F[Retention limit exceeded] --> G[DELETE old rows]
    G --> D
    D --> H{WAL near 4MiB?}
    H -->|Yes| I[Checkpoint writes back to main DB]
    H -->|No| J[Keep appending]
    I --> K[SSD controller writes physically]
    J --> K
    K --> L[NAND write volume amplified further]

Logically it’s one small row, but INSERT, index update, WAL append, DELETE, checkpoint, and SSD-internal relocation pile physical writes several layers deep.
That’s why DB size and actual write volume diverge.

The same signature shows up on Windows too.
openai/codex #29463 reports that on the Codex Desktop 26.616 series, TRACE WebSocket logs keep landing in logs_2.sqlite and don’t stop even with [analytics].enabled = false and [otel].exporter = "none".
In a 28-second sample, max(id) advanced by 573, and among the most recent 1,000 ids, TRACE log occupied 839 rows.

Two fix PRs landed on June 22

#28224 was closed on June 22, 2026.
The reporter closed it after two PRs were merged the same day, noting that on their Codex the logging looked about 85% avoidable.

The first, #29432, stops emitting three kinds of local log for every successful Responses WebSocket event.
The PR explains that each WebSocket event created a full TRACE payload, an OpenTelemetry log event, and an OpenTelemetry trace event, and that on busy threads a 1,000-row partition filled in seconds, driving rapid repeated inserts and deletes in SQLite.
What it stops is only the “per-successful-WebSocket-event payload logs and mirror events”; it keeps the WebSocket event counters, latency metrics, response timing, parsing, and error handling. The aggregate values useful for diagnosis aren’t thrown away; only the raw per-event logs are dropped.

The second, #29457, removes noisy targets from what goes into the persistent log.
It excludes dependency-library logs bridged through target=log, plus codex_otel.log_only and codex_otel.trace_safe, from the SQLite sink.
Here, target is a label like a module name that Rust’s tracing crate attaches to log rows. target=log is logs flowing in via the log crate from dependencies (tokio-tungstenite, hyper_util, and so on), and codex_otel.log_only and codex_otel.trace_safe are mirror events duplicated for OpenTelemetry.
The PR drops only these three targets from the SQLite sink and keeps TRACE persistence for other targets. It also notes that the app-server and the TUI share the same filter.
One caveat: what’s removed is only the local SQLite persistence; the remote OpenTelemetry export and metrics still run. It isn’t that telemetry was turned off entirely; more precisely, the duplicated logs written verbatim to local storage were stopped.

Both PRs are in CLI 0.142.0.
GitHub Releases 0.142.0 (tag rust-v0.142.0, published June 22 at 22:19 UTC) spells out both numbers under Chores: “Reduced persistent-log churn by removing per-event WebSocket payload logging and filtering duplicated telemetry records. (#29432, #29457).”
Meanwhile, OpenAI’s official Codex changelog (developers.openai.com) only says “additional performance improvements and bug fixes” for Codex app 26.616 as of June 18, without these two PR numbers. The CLI’s GitHub Releases and the desktop app’s changelog reflect changes at different times.

So locally, instead of treating “I updated to the latest, done” as the end, it’s better to watch whether the mtime or size of logs_2.sqlite-wal still grows after updating.
In particular, Codex Desktop and the VS Code extension update their internal app-server build on a separate track from the CLI. The app version number (the 26.616 series) and the CLI version number (0.142.0) use different schemes, so updating one doesn’t guarantee the fix is in the app-server.

Where to look locally

On macOS or Linux, start by checking the size of ~/.codex.

du -sh ~/.codex
ls -lh ~/.codex/logs_2.sqlite*

To see whether it grows while running, wait a few dozen seconds and run the same commands again.
If the size or mtime of logs_2.sqlite-wal keeps moving, check for processes still around even with Codex closed.

pgrep -af codex
lsof -nP ~/.codex/logs_2.sqlite-wal

#22444 shows an old Codex TUI session left in tmux holding onto a deleted, huge WAL: du dropped but df -h didn’t return free space.
In this shape, deleting the file doesn’t return capacity until the process closes its file descriptor.

For Codex Desktop on Windows, check C:\Users\<user>\.codex\logs_2.sqlite and the WAL files.
#29177 reports that moving the logs, state, memories, and goals SQLite files to a RAM disk on Windows made things feel lighter.
That’s a symptomatic workaround, and putting state you want persisted onto volatile storage invites a different accident.
If you do it, limit it to the log files.

Local I/O grows in long sessions too

This isn’t a story about an AI agent breaking your code, or escaping outward via prompt injection.
Just by using it normally, the auxiliary logs on your machine steadily wear the SSD down.

On the Claude Code side, tool calls breaking with “court” attached also saw more hangs in long-session-leaning usage.
This time it’s the Codex side, where long uptime and multiple sessions load the local SQLite.

If you leave agents running for days in tmux or a desktop app, mind not just response quality but the local side effects too.
On top of CPU, memory, GPU, network, and process count, keep an eye on the write volume of working directories like ~/.codex and ~/.claude.

What to do right now is small.

  • Update Codex CLI and Codex Desktop to the latest
  • Close TUI or Desktop threads left idle for a long time
  • Check whether ~/.codex/logs_2.sqlite-wal has grown into the GB range
  • If deleting doesn’t return free space, find the Codex process holding the deleted file
  • If it still grows, report with the issue number, Codex version, OS, and the breakdown of logs_2.sqlite

Rather than hyping it as “killing the SSD,” it’s easier to handle as an accident where a TRACE-log persistence setting was left in a shipping local app.
If logging is the goal, don’t just cap retention; knock out write frequency, the WAL, and old processes holding files at the same time.

References