CLI-Anything turns almost any software into something AI agents can operate
Contents
It is now normal for AI agents to write code and manipulate files. But operating GUI software - editing images in GIMP, rendering 3D in Blender, or generating PDFs in LibreOffice - was still a pain point.
CLI-Anything, released by HKUDS at the University of Hong Kong, is a very sensible answer to that problem. Feed it source code and it automatically generates a CLI harness that AI agents can use. It reached more than 19,000 GitHub stars about 10 days after release.
What was the problem?
There have been several ways for AI agents to drive GUI software, but all had issues.
- Screenshot plus click automation: breaks when the UI changes, slow, and unstable
- Limited APIs: only cover part of the application
- Reimplementation: rewrites the software’s functionality in separate code, losing 90% of the product
CLI-Anything starts from a simpler observation: CLI is a very good fit for LLMs. It is text-based and structured, self-documenting via --help, easy to parse with JSON, and composable through pipes.
The 7-stage generation pipeline
The core of CLI-Anything is a seven-stage pipeline that generates a CLI from source code.
graph TD
A[Phase 1: Analyze<br/>Source code inspection] --> B[Phase 2: Design<br/>Command structure]
B --> C[Phase 3: Implement<br/>Click CLI]
C --> D[Phase 4: Plan Tests<br/>Test design]
D --> E[Phase 5: Write Tests<br/>Test implementation]
E --> F[Phase 6: Document<br/>Docs generation]
F --> G[Phase 7: Publish<br/>Distribute via pip]
- Analyze - scan the source code and map GUI actions to APIs
- Design - define command groups, the state model, and output formats
- Implement - build the CLI in Python with Click, including REPL mode, JSON output, and Undo / Redo
- Plan Tests - write unit and end-to-end test plans
- Write Tests - implement the test suite
- Document - update the docs using the test results
- Publish - generate
setup.pyand install it withpip install -e .
There is also a Phase 6.5: SKILL.md generation. It extracts metadata from the generated CLI’s Click decorators and setup.py so the agent can auto-discover the skill file.
Supported software and test results
CLI harnesses were generated and tested for 16 applications.
| Software | Domain | Backend | Tests |
|---|---|---|---|
| GIMP | image editing | Pillow + GEGL / Script-Fu | 107 |
| Blender | 3D modeling | bpy (Python scripting) | 208 |
| Inkscape | vector graphics | direct SVG/XML manipulation | 202 |
| Audacity | audio editing | Python wave + sox | 161 |
| LibreOffice | office suite | ODF generation + headless LibreOffice | 158 |
| OBS Studio | streaming / recording | JSON scene + obs-websocket | 153 |
| Kdenlive | video editing | MLT XML + melt | 155 |
| Shotcut | video editing | MLT XML + melt | 154 |
| Draw.io | diagramming | mxGraph XML + draw.io CLI | 138 |
| Mubu | knowledge management | local data + sync logs | 96 |
| ComfyUI | AI image generation | ComfyUI REST API | 70 |
| AnyGen | AI content generation | AnyGen REST API | 50 |
| AdGuardHome | networking | AdGuardHome API | 36 |
| Zoom | video conferencing | Zoom REST API (OAuth2) | 22 |
| Mermaid | diagramming | mermaid.ink renderer | 10 |
Total: 1,720 tests, all passing. That breaks down into 1,247 unit tests and 473 E2E tests.
The important part is that these are real backends, not mocks. LibreOffice tests actually generated PDFs in headless mode and verified the %PDF- magic bytes. Blender tests actually rendered PNG output. This is real software validation, not a fake reimplementation.
How to use it
As a Claude Code plugin, the workflow is three steps:
# Add the marketplace
/plugin marketplace add HKUDS/CLI-Anything
# Install the plugin
/plugin install cli-anything
# Generate a CLI (GIMP example)
/cli-anything:cli-anything ./gimp
The generated CLI is used like this:
# Install
cd gimp/agent-harness && pip install -e .
# Show help
cli-anything-gimp --help
# Create a project
cli-anything-gimp project new --width 1920 --height 1080 -o poster.json
# Add a layer
cli-anything-gimp --project poster.json layer add -n "Background" --type solid --color "#1a1a2e"
# JSON output for agents
cli-anything-gimp --json document info --project poster.json
It also works with OpenCode, OpenClaw, Codex, Qodercli, and Goose.
Design choices that stand out
Several design decisions are interesting:
Dual mode operation: every CLI works both as subcommands for scripts and pipelines, and as a REPL for interactive use. Running it without arguments drops you into the REPL.
A unified --json flag: every command can emit structured JSON so an agent can consume it. Humans still get table output.
Delegation to the real backend: the CLI produces project files such as ODF, MLT XML, and SVG, while the real application handles rendering. LibreOffice does the PDF generation, Blender does the 3D rendering, and Audacity does the audio processing. It is a CLI bridge to the real product, not a toy rewrite.
ReplSkin: a consistent REPL interface with a brand banner, command history, progress display, and Undo / Redo.
refine expands coverage
After the first generation pass, you can run /cli-anything:refine to analyze gaps and add missing commands.
# Broad gap analysis
/cli-anything:refine ./gimp
# Focus on specific functionality
/cli-anything:refine ./gimp "batch processing and filters"
Each pass is incremental and non-destructive, so you can run it as many times as needed.
Compared with GUI automation
Here is how it compares with screenshot-and-click agents:
| Dimension | GUI automation | CLI-Anything |
|---|---|---|
| Speed | slow because it must capture and parse screenshots | immediate command execution |
| Stability | breaks when the UI changes | CLI APIs are stable |
| Coverage | only what is visible on screen | the entire backend API |
| Structured output | inferred from pixels | JSON output |
| Requirement | needs a display | works headlessly |
CLI-Anything still has limits. You need source code that can be read. If you only have a compiled binary, decompilation is required and quality drops sharply. The generated CLI quality also depends on the structure of the codebase and the quality of its APIs.
Can it work on your own app?
When people hear “source code required,” they may assume this is only for open-source projects, but that is not true. CLI-Anything uses LLM-based static analysis: in Phase 1 it identifies backend engines, data-model formats, and the mapping between GUI actions and APIs. In other words, the code just has to be readable. It does not need to be open source.
If you have the source locally, it can work on your own app or on a private repository.
# Local path
/cli-anything:cli-anything ./my-private-app
# GitHub repo URL is fine too, if you have access
/cli-anything https://github.com/your-org/your-repo
You can point it at your own GitHub repo directly. For private repos, git or gh authentication just needs to work. Cloning locally also works.
The resulting CLI quality depends heavily on the codebase:
- apps with a clear API split - for example, well-organized Python modules, REST APIs, or scripting interfaces - produce better CLIs
- apps with structured data models - JSON, XML, SVG, and similar formats are easier to analyze
- apps with solid documentation and README files give the LLM the context it needs
The real condition is not public versus private. It is whether the LLM can read and understand the code. Even an open-source project can produce a poor CLI if the code is spaghetti or the docs are missing.
The generated CLI calls the real backend at runtime, so the target software has to be installed and working on the machine. If the app itself does not run, the CLI will not run either.