Claude Code HTML output: throwaway editing UIs, side-by-side comparisons, and when to skip Markdown

When Claude Code writes long Markdown, comparisons, diffs, and design proposals stack vertically.
Reading three options in sequence while keeping their differences in your head is harder than it needs to be.

Thariq’s companion page The unreasonable effectiveness of HTML gives a surprisingly practical answer.
Have Claude Code produce self-contained .html files with CSS, SVG, and a bit of JavaScript all in one file, then open them in a browser.
This isn’t about making Claude Code build websites. It’s about moving review artifacts from Markdown to HTML.

The page includes 20 HTML examples.
I looked at all of them and dug into the parts that seemed useful.

Comparisons are where Markdown hurts the most

Ask Claude Code for three alternatives and you get option A, option B, option C flowing down the page.
You finish reading A, scroll to B, hold the differences in your head, then scroll to C.
The longer the text, the worse this gets.

HTML puts them side by side.
Color-code each card, mark the preferred candidate, pull risks into a separate box.
Same amount of text, but now the items you’re comparing are all in view at once.

In WUPHF’s LLM wiki, I wrote about keeping an agent’s canonical memory in Markdown and Git.
That was about Markdown’s strength as a long-term store.
HTML here is the opposite—it’s strong as a display surface for decisions you’re making right now.
Storage and display don’t have to share a format.

Throwaway editing UIs are the most interesting part

The examples that caught my attention most were under Custom Editing Interfaces.
A ticket priority board, a feature flag editor, a prompt tuner.

Sorting 30 tickets into Now / Next / Later / Cut by editing a Markdown list is asking for mistakes.
Build an HTML page where you drag items between columns and export to Markdown or JSON at the end, and the human gets a proper UI while the agent gets structured data back.

Prompt tuning works the same way.
Highlight template variables, show sample outputs side by side.
Faster than going back and forth in a chat with “change this part” and “try again.”

This is where Claude Code’s output shifts from document to tool.
Instead of text to read, it hands you a state to interact with.
In the piece on CLI-to-AI as the human-software interface, I wrote about AI becoming the input layer. HTML output is the flip side—when AI returns results to humans, it can use the browser too, not just text.

These are throwaway by design.
No need for accessibility polish or responsive layouts.
The bar is “I can use this right now.”

Diff annotations for code review

Code review is another natural fit for HTML.
Markdown PR descriptions work fine for motivation, change lists, and test results, but they’re weak at annotating specific diff lines, highlighting critical sections, or color-coding by risk level.

Thariq’s examples include an annotated pull request, a PR writeup, and a module map.
Annotations next to diff lines.
Intent cards per file.
Module dependencies drawn as boxes and arrows.
All faster to scan in a browser than in Claude Code’s reply pane.

In multi-agent PR review with Claude Code, the problem wasn’t whether the model could review.
It was how to arrange multiple findings so you know where to look first.
HTML lets you put severity, target file, reproduction steps, and the relevant diff on the same screen.

Interactive explanations don’t translate back to text

Animation easing described in prose doesn’t land.
Click flows listed as bullet points lack the feel of actually navigating them.
Thariq’s examples show an animation sandbox, clickable flows, arrow-key slides, and collapsible feature breakdowns.

All of this runs in a single HTML file.
No build step, no dependencies.
Attach it to a PR, drop it in Slack, open it locally—all lightweight.
Heavier than Markdown, but nowhere near spinning up a mini-app.

The risk of opening HTML carelessly

Don’t open HTML from untrusted sources without checking.
It can contain <script> tags, and even when opened as a local file, you should verify outbound links, external resources, form submissions, and clipboard access.
When having Claude Code generate these, specify no external requests, no dependencies, and manual-export-only for saving data.

Git adds friction too.
Long CSS, SVG, and JavaScript packed into one file create diff noise.
Generate these as throwaway review files and delete them when you’re done.
No need to commit them.

Only switch to HTML where Markdown actually hurts

After going through all of this, I noticed the pattern matches how I already use this blog’s experiment flow.

I find an interesting topic, write a deep-dive article first, then pick things to try and create experiment posts with ongoing updates.
The article serves as a way to visualize intermediate artifacts, record decisions along the way, and figure out what to do next.

Thariq’s HTML proposal is throwaway, while blog posts are published.
But during writing, they serve the same purpose.

The gap is that Markdown articles can’t do side-by-side comparisons or drag-to-reorder.
For structuring an article or keeping experiment logs in text, Markdown works fine.
But as I wrote in CLAUDE.md separation and the problem of files not being read, having information in a file doesn’t help if the presentation doesn’t match how you need to see it.
Laying out three options side by side or dragging 30 items into buckets is faster in HTML.

Switching to HTML only in the moments where Markdown feels painful during writing—that’s the workflow I want to try.

AI-only output needs neither Markdown nor HTML

Everything above assumes a human reads the output in the end.
Choosing HTML for comparisons, Markdown for records—both are decisions shaped by human eyes.

But agent workflows have intermediate outputs no human ever sees.
If Claude Code breaks a task into steps and step A’s result just feeds into step B’s prompt, that intermediate result doesn’t need HTML, or even Markdown.
JSON is enough. Plain text works if the structure is simple.
As long as the next step can parse it mechanically, format doesn’t matter.

The tricky case is when a human interrupts midway.
An AI-only pipeline produces intermediate results, and someone says “let me take a look.”
A JSON array won’t support that decision. If three options sit in JSON, a human still needs HTML to lay them out side by side.
The reverse happens too: once a workflow stabilizes enough to run without human checks, there’s no reason to keep generating HTML.

Earlier I wrote “storage and display don’t have to share a format.”
More precisely, the right format also depends on whether the reader is human or AI.
Intermediate data consumed by AI, a display surface for humans making decisions right now, a record humans search later in Git.
The same information fits different formats at different phases, and as more phases shift to AI, fewer of them need a human-readable view.
HTML’s sweet spot is the phase where humans haven’t let go of checking yet.

References

The unreasonable effectiveness of HTML