Tech 6 min read

CLI-Anything turns almost any software into something AI agents can operate

IkesanContents

It is now normal for AI agents to write code and manipulate files. But operating GUI software - editing images in GIMP, rendering 3D in Blender, or generating PDFs in LibreOffice - was still a pain point.

CLI-Anything, released by HKUDS at the University of Hong Kong, is a very sensible answer to that problem. Feed it source code and it automatically generates a CLI harness that AI agents can use. It reached more than 19,000 GitHub stars about 10 days after release.

What was the problem?

There have been several ways for AI agents to drive GUI software, but all had issues.

  • Screenshot plus click automation: breaks when the UI changes, slow, and unstable
  • Limited APIs: only cover part of the application
  • Reimplementation: rewrites the software’s functionality in separate code, losing 90% of the product

CLI-Anything starts from a simpler observation: CLI is a very good fit for LLMs. It is text-based and structured, self-documenting via --help, easy to parse with JSON, and composable through pipes.

The 7-stage generation pipeline

The core of CLI-Anything is a seven-stage pipeline that generates a CLI from source code.

graph TD
    A[Phase 1: Analyze<br/>Source code inspection] --> B[Phase 2: Design<br/>Command structure]
    B --> C[Phase 3: Implement<br/>Click CLI]
    C --> D[Phase 4: Plan Tests<br/>Test design]
    D --> E[Phase 5: Write Tests<br/>Test implementation]
    E --> F[Phase 6: Document<br/>Docs generation]
    F --> G[Phase 7: Publish<br/>Distribute via pip]
  1. Analyze - scan the source code and map GUI actions to APIs
  2. Design - define command groups, the state model, and output formats
  3. Implement - build the CLI in Python with Click, including REPL mode, JSON output, and Undo / Redo
  4. Plan Tests - write unit and end-to-end test plans
  5. Write Tests - implement the test suite
  6. Document - update the docs using the test results
  7. Publish - generate setup.py and install it with pip install -e .

There is also a Phase 6.5: SKILL.md generation. It extracts metadata from the generated CLI’s Click decorators and setup.py so the agent can auto-discover the skill file.

Supported software and test results

CLI harnesses were generated and tested for 16 applications.

SoftwareDomainBackendTests
GIMPimage editingPillow + GEGL / Script-Fu107
Blender3D modelingbpy (Python scripting)208
Inkscapevector graphicsdirect SVG/XML manipulation202
Audacityaudio editingPython wave + sox161
LibreOfficeoffice suiteODF generation + headless LibreOffice158
OBS Studiostreaming / recordingJSON scene + obs-websocket153
Kdenlivevideo editingMLT XML + melt155
Shotcutvideo editingMLT XML + melt154
Draw.iodiagrammingmxGraph XML + draw.io CLI138
Mubuknowledge managementlocal data + sync logs96
ComfyUIAI image generationComfyUI REST API70
AnyGenAI content generationAnyGen REST API50
AdGuardHomenetworkingAdGuardHome API36
Zoomvideo conferencingZoom REST API (OAuth2)22
Mermaiddiagrammingmermaid.ink renderer10

Total: 1,720 tests, all passing. That breaks down into 1,247 unit tests and 473 E2E tests.

The important part is that these are real backends, not mocks. LibreOffice tests actually generated PDFs in headless mode and verified the %PDF- magic bytes. Blender tests actually rendered PNG output. This is real software validation, not a fake reimplementation.

How to use it

As a Claude Code plugin, the workflow is three steps:

# Add the marketplace
/plugin marketplace add HKUDS/CLI-Anything

# Install the plugin
/plugin install cli-anything

# Generate a CLI (GIMP example)
/cli-anything:cli-anything ./gimp

The generated CLI is used like this:

# Install
cd gimp/agent-harness && pip install -e .

# Show help
cli-anything-gimp --help

# Create a project
cli-anything-gimp project new --width 1920 --height 1080 -o poster.json

# Add a layer
cli-anything-gimp --project poster.json layer add -n "Background" --type solid --color "#1a1a2e"

# JSON output for agents
cli-anything-gimp --json document info --project poster.json

It also works with OpenCode, OpenClaw, Codex, Qodercli, and Goose.

Design choices that stand out

Several design decisions are interesting:

Dual mode operation: every CLI works both as subcommands for scripts and pipelines, and as a REPL for interactive use. Running it without arguments drops you into the REPL.

A unified --json flag: every command can emit structured JSON so an agent can consume it. Humans still get table output.

Delegation to the real backend: the CLI produces project files such as ODF, MLT XML, and SVG, while the real application handles rendering. LibreOffice does the PDF generation, Blender does the 3D rendering, and Audacity does the audio processing. It is a CLI bridge to the real product, not a toy rewrite.

ReplSkin: a consistent REPL interface with a brand banner, command history, progress display, and Undo / Redo.

refine expands coverage

After the first generation pass, you can run /cli-anything:refine to analyze gaps and add missing commands.

# Broad gap analysis
/cli-anything:refine ./gimp

# Focus on specific functionality
/cli-anything:refine ./gimp "batch processing and filters"

Each pass is incremental and non-destructive, so you can run it as many times as needed.

Compared with GUI automation

Here is how it compares with screenshot-and-click agents:

DimensionGUI automationCLI-Anything
Speedslow because it must capture and parse screenshotsimmediate command execution
Stabilitybreaks when the UI changesCLI APIs are stable
Coverageonly what is visible on screenthe entire backend API
Structured outputinferred from pixelsJSON output
Requirementneeds a displayworks headlessly

CLI-Anything still has limits. You need source code that can be read. If you only have a compiled binary, decompilation is required and quality drops sharply. The generated CLI quality also depends on the structure of the codebase and the quality of its APIs.

Can it work on your own app?

When people hear “source code required,” they may assume this is only for open-source projects, but that is not true. CLI-Anything uses LLM-based static analysis: in Phase 1 it identifies backend engines, data-model formats, and the mapping between GUI actions and APIs. In other words, the code just has to be readable. It does not need to be open source.

If you have the source locally, it can work on your own app or on a private repository.

# Local path
/cli-anything:cli-anything ./my-private-app

# GitHub repo URL is fine too, if you have access
/cli-anything https://github.com/your-org/your-repo

You can point it at your own GitHub repo directly. For private repos, git or gh authentication just needs to work. Cloning locally also works.

The resulting CLI quality depends heavily on the codebase:

  • apps with a clear API split - for example, well-organized Python modules, REST APIs, or scripting interfaces - produce better CLIs
  • apps with structured data models - JSON, XML, SVG, and similar formats are easier to analyze
  • apps with solid documentation and README files give the LLM the context it needs

The real condition is not public versus private. It is whether the LLM can read and understand the code. Even an open-source project can produce a poor CLI if the code is spaghetti or the docs are missing.

The generated CLI calls the real backend at runtime, so the target software has to be installed and working on the machine. If the app itself does not run, the CLI will not run either.