CLI-Anything turns almost any software into something AI agents can operate

It is now normal for AI agents to write code and manipulate files. But operating GUI software - editing images in GIMP, rendering 3D in Blender, or generating PDFs in LibreOffice - was still a pain point.

CLI-Anything, released by HKUDS at the University of Hong Kong, is a very sensible answer to that problem. Feed it source code and it automatically generates a CLI harness that AI agents can use. It reached more than 19,000 GitHub stars about 10 days after release.

What was the problem?

There have been several ways for AI agents to drive GUI software, but all had issues.

Screenshot plus click automation: breaks when the UI changes, slow, and unstable
Limited APIs: only cover part of the application
Reimplementation: rewrites the software’s functionality in separate code, losing 90% of the product

CLI-Anything starts from a simpler observation: CLI is a very good fit for LLMs. It is text-based and structured, self-documenting via --help, easy to parse with JSON, and composable through pipes.

The 7-stage generation pipeline

The core of CLI-Anything is a seven-stage pipeline that generates a CLI from source code.

graph TD
    A[Phase 1: Analyze<br/>Source code inspection] --> B[Phase 2: Design<br/>Command structure]
    B --> C[Phase 3: Implement<br/>Click CLI]
    C --> D[Phase 4: Plan Tests<br/>Test design]
    D --> E[Phase 5: Write Tests<br/>Test implementation]
    E --> F[Phase 6: Document<br/>Docs generation]
    F --> G[Phase 7: Publish<br/>Distribute via pip]

Analyze - scan the source code and map GUI actions to APIs
Design - define command groups, the state model, and output formats
Implement - build the CLI in Python with Click, including REPL mode, JSON output, and Undo / Redo
Plan Tests - write unit and end-to-end test plans
Write Tests - implement the test suite
Document - update the docs using the test results
Publish - generate setup.py and install it with pip install -e .

There is also a Phase 6.5: SKILL.md generation. It extracts metadata from the generated CLI’s Click decorators and setup.py so the agent can auto-discover the skill file.

Supported software and test results

CLI harnesses were generated and tested for 16 applications.

Software	Domain	Backend	Tests
GIMP	image editing	Pillow + GEGL / Script-Fu	107
Blender	3D modeling	bpy (Python scripting)	208
Inkscape	vector graphics	direct SVG/XML manipulation	202
Audacity	audio editing	Python wave + sox	161
LibreOffice	office suite	ODF generation + headless LibreOffice	158
OBS Studio	streaming / recording	JSON scene + obs-websocket	153
Kdenlive	video editing	MLT XML + melt	155
Shotcut	video editing	MLT XML + melt	154
Draw.io	diagramming	mxGraph XML + draw.io CLI	138
Mubu	knowledge management	local data + sync logs	96
ComfyUI	AI image generation	ComfyUI REST API	70
AnyGen	AI content generation	AnyGen REST API	50
AdGuardHome	networking	AdGuardHome API	36
Zoom	video conferencing	Zoom REST API (OAuth2)	22
Mermaid	diagramming	mermaid.ink renderer	10

Total: 1,720 tests, all passing. That breaks down into 1,247 unit tests and 473 E2E tests.

The important part is that these are real backends, not mocks. LibreOffice tests actually generated PDFs in headless mode and verified the %PDF- magic bytes. Blender tests actually rendered PNG output. This is real software validation, not a fake reimplementation.

How to use it

As a Claude Code plugin, the workflow is three steps:

# Add the marketplace
/plugin marketplace add HKUDS/CLI-Anything

# Install the plugin
/plugin install cli-anything

# Generate a CLI (GIMP example)
/cli-anything:cli-anything ./gimp

The generated CLI is used like this:

# Install
cd gimp/agent-harness && pip install -e .

# Show help
cli-anything-gimp --help

# Create a project
cli-anything-gimp project new --width 1920 --height 1080 -o poster.json

# Add a layer
cli-anything-gimp --project poster.json layer add -n "Background" --type solid --color "#1a1a2e"

# JSON output for agents
cli-anything-gimp --json document info --project poster.json

It also works with OpenCode, OpenClaw, Codex, Qodercli, and Goose.

Design choices that stand out

Several design decisions are interesting:

Dual mode operation: every CLI works both as subcommands for scripts and pipelines, and as a REPL for interactive use. Running it without arguments drops you into the REPL.

A unified --json flag: every command can emit structured JSON so an agent can consume it. Humans still get table output.

Delegation to the real backend: the CLI produces project files such as ODF, MLT XML, and SVG, while the real application handles rendering. LibreOffice does the PDF generation, Blender does the 3D rendering, and Audacity does the audio processing. It is a CLI bridge to the real product, not a toy rewrite.

ReplSkin: a consistent REPL interface with a brand banner, command history, progress display, and Undo / Redo.

`refine` expands coverage

After the first generation pass, you can run /cli-anything:refine to analyze gaps and add missing commands.

# Broad gap analysis
/cli-anything:refine ./gimp

# Focus on specific functionality
/cli-anything:refine ./gimp "batch processing and filters"

Each pass is incremental and non-destructive, so you can run it as many times as needed.

Compared with GUI automation

Here is how it compares with screenshot-and-click agents:

Dimension	GUI automation	CLI-Anything
Speed	slow because it must capture and parse screenshots	immediate command execution
Stability	breaks when the UI changes	CLI APIs are stable
Coverage	only what is visible on screen	the entire backend API
Structured output	inferred from pixels	JSON output
Requirement	needs a display	works headlessly

CLI-Anything still has limits. You need source code that can be read. If you only have a compiled binary, decompilation is required and quality drops sharply. The generated CLI quality also depends on the structure of the codebase and the quality of its APIs.

Can it work on your own app?

When people hear “source code required,” they may assume this is only for open-source projects, but that is not true. CLI-Anything uses LLM-based static analysis: in Phase 1 it identifies backend engines, data-model formats, and the mapping between GUI actions and APIs. In other words, the code just has to be readable. It does not need to be open source.

If you have the source locally, it can work on your own app or on a private repository.

# Local path
/cli-anything:cli-anything ./my-private-app

# GitHub repo URL is fine too, if you have access
/cli-anything https://github.com/your-org/your-repo

You can point it at your own GitHub repo directly. For private repos, git or gh authentication just needs to work. Cloning locally also works.

The resulting CLI quality depends heavily on the codebase:

apps with a clear API split - for example, well-organized Python modules, REST APIs, or scripting interfaces - produce better CLIs
apps with structured data models - JSON, XML, SVG, and similar formats are easier to analyze
apps with solid documentation and README files give the LLM the context it needs

The real condition is not public versus private. It is whether the LLM can read and understand the code. Even an open-source project can produce a poor CLI if the code is spaghetti or the docs are missing.

The generated CLI calls the real backend at runtime, so the target software has to be installed and working on the machine. If the app itself does not run, the CLI will not run either.