Tech 10 min read

From CLI to AI, the Way Humans Talk to Software Is Changing

IkesanContents

An article on DEV Community lays out the history from punch cards through CLI, GUI, the web, mobile, chatbots, and LLM agents as a timeline of “conversations between humans and machines.”
The most interesting takeaway is that GUI didn’t kill CLI, and AI won’t kill GUI or CLI either.

What’s happening now is more than just putting a natural-language field on top.
A human writes what they want done, and an AI routes that to CLI, API, browser, IDE, or MCP tools.
The entry point got softer, but on the execution side the value of text-based, machine-readable layers has gone up.

AI Doesn’t Come After GUI, It Sits on Top of CLI

The original article treats CLI as “a conversation only people who memorized the exact grammar can have.”
Type cp source.txt /destination/ and one wrong flag means silence or an error.
GUI hid that rigidity by replacing it with visual objects: files, buttons, menus, windows.

But look at today’s AI agents, and CLI is actually thicker under the hood.
Claude Code, Codex, Copilot CLI, and Cursor’s cloud agents all take natural-language requests and then call git, pnpm, gh, docker, kubectl, and assorted MCP tools.
The user no longer needs the grammar, but the agent needs tools with explicit grammar.

This is close to what I covered in CLI-Anything turns any software into an agent-ready interface.
CLI-Anything generates a CLI harness from GUI software like GIMP or Blender.
GUI feels more natural to a human, but --json and stable subcommands are easier for an agent.

graph TD
    A[Human natural language] --> B[AI agent]
    B --> C[CLI]
    B --> D[API]
    B --> E[MCP tools]
    B --> F[GUI automation]
    C --> G[Actual software]
    D --> G
    E --> G
    F --> G

The point of this diagram is that AI softens the top-level input surface while the execution surface underneath still uses multiple legacy interfaces as-is.
CLI, API, GUI, and MCP don’t compete so much as line up as execution paths the AI picks from.

From Memorizing Grammar to Inspecting Results

The burden in the CLI era was memorizing command names, arguments, order, paths, quoting, and exit codes.
The burden in the GUI era was hunting for where a feature was hidden and clicking through screen flows.

The burden in the AI-agent era is different.
Instead of memorizing low-level operations, humans shift to inspecting whether the agent’s output is correct.
For code, you read the diff.
For deploys, you check logs and rollback conditions.
For design changes, you compare the browser rendering against the source diff.

As I wrote in Cursor 3 rebuilt the IDE as an agent control tower, the IDE is moving from “a place where humans write code” to “a place where humans supervise multiple agents.”
Cursor 3’s Agent Tabs and cloud agents resemble a work queue and review screen more than a text editor.

What matters here is not how well the AI writes prose but how inspectable its output is.
An agent can say “done” all it wants; without a diff, test results, screenshots, or execution logs, there’s nothing to trust.
The more convenient the natural-language entry point gets, the more the exit side needs mechanically verifiable evidence.

Why Agent-Oriented CLIs Are Proliferating

Recent official tools aren’t just tidying up human-facing CLIs.
CLIs built with agents in mind are shipping with structured output, schemas, dry-run, permission controls, and audit logs.

In Google ships Android CLI v0.7 preview with android create and skills for agents, Google consolidated Android development operations under the android command and released Skills and a Knowledge Base alongside it.
Feeding the official CLI and official Skills to an AI burns fewer tokens and fewer failure paths than screen-sharing Android Studio.

SwitchBot’s official CLI @switchbot/openapi-cli ships with MCP server and MQTT stream for AI-driven home automation follows the same pattern.
A single binary carries colored tables for humans, JSON for scripts, and an MCP server for agents.
On top of that, --dry-run, audit logging, and guards on destructive operations.
This is a clear example of the parts an “AI-callable CLI” needs coming into focus.

The original article writes: “CLI didn’t die, it won.”
That’s a bit strong, but CLI is undeniably getting a second look in the agent era.
Buttons on screen are friendly to humans.
But if you’re handing work to an AI, having the operation behind the button exposed as a named command is more powerful.

GUI Automation Models Are Not the Last Resort

That said, pushing everything to CLI isn’t the answer either.
Some operations require looking at screen state to make decisions.
Form layout breakage, drag operations, design review, browser games, automating existing apps—there are cases where a model that directly handles the GUI is the more natural fit.

UI-TARS-1.5-7B: a vision AI agent that hit SOTA on GUI grounding and FDM-1, trained on 11 million hours of video with a 50x-efficient video encoder are exactly this side of the evolution.
They read UI state from screenshots or video and predict mouse and keyboard actions.
Even without a CLI or API, they can operate by looking at the same screen a human sees.

However, GUI automation models are not a convenient universal layer.
They depend on screen resolution, accessibility information, coordinate transforms, animations, modals, and login state.
If you need to repeat the same operation 100 times, calling a CLI or API is faster and easier to record and verify than clicking through a GUI.

The entry point you hand to an agent varies by target.
GUI for parts humans judge by sight, CLI for repeated execution, API for state queries, MCP for external tool integrations.
AI is the upper layer that bundles these, not something that dissolves everything into natural language alone.

The More Conversational Software Gets, the More Boundaries It Needs

The original article notes that chatbots around 2015–2016 “just put text input on a decision tree” and failed.
That observation is still highly relevant.
Natural-language input without executable operations, state retrieval, error handling, and permission boundaries behind it is just a fuzzy form.

Reading Stripe’s Minions article drove the same point home.
Inside Stripe’s AI coding agent “Minions” laid out Devbox, Blueprints, Toolshed, and CI iteration caps as clearly separated concerns.
Being able to hand off a task in natural language mattered less than what environment, which tools, how many iterations, and what permissions are in play.

The more software becomes “conversational,” the more the design surface extends beyond UI.
Accept natural requests from humans.
Expose structured tools for agents.
Leave logs, permissions, dry-run, rollback, and tests for operations.
If that lineup is incomplete, AI integration stops at “a chat box.”

The original article’s historical framing is abstract, but pulling it into the current development landscape makes it quite concrete.
Interface design in the AI era is not about adding a natural-language UI—it’s about re-exposing a software’s operation surface in a form readable by both humans and agents.

How CLI Became GUI’s Counterpart

In Japan, “CUI” (Character User Interface) became the established opposite of GUI.
It means a character-based operation screen.
In English-speaking contexts, though, CUI is rarely used.

The English-language taxonomy splits text-based interfaces into CLI (Command Line Interface) and TUI (Text User Interface).
CLI is the interactive mode where you type a command and get a result.
TUI is the form where you navigate menus and windows drawn in text, like ncurses or htop.
Japanese “CUI” covers both, but the umbrella term never gained traction in English.

This is also why CLI specifically comes to the foreground in the AI-agent context.
For an agent, a TUI is a visual layout for humans—just as hard to drive programmatically as a GUI.
htop is readable for a human, but if an agent needs to kill a process, the kill command is more reliable and faster.

What agents need is not a character-based screen but named operations and structured output.
That’s why the relevant unit is CLI (command-line interface), not CUI (character-based screens in general).

MCP vs. CLI and the Context-Window Cost Difference

This article has lined up CLI, API, and MCP as execution paths for agents, but what makes a real difference in practice is context-window consumption.

MCP tools load their JSON Schema tool definitions into the model’s context.
If a server has ten tools, ten sets of names, parameters, and descriptions are expanded into the system prompt.
Tool definitions alone easily run into thousands of tokens, and execution results come back into context too.

CLI, on the other hand, only needs a single shell tool definition to run any command.
Whether it’s git status or docker ps, the tool schema is the same: “take a command string and execute it.”
Adding more tools doesn’t increase context consumption.
Output comes back as plain stdout text.

Looked at in isolation, CLI is overwhelmingly lighter.
The catch is that CLI relies on the model “knowing” the right commands and flags.
For git or npm, which appear massively in training data, that’s fine. For internal tools or niche CLIs, the model gets the syntax wrong.

MCP’s schema definitions bridge this gap.
Parameter names, types, required-vs-optional, and descriptions arrive as JSON Schema, so the model doesn’t have to guess how to call a tool.
SwitchBot’s CLI shipping --json and an MCP server in the same binary is a design that balances CLI’s lightness with MCP’s discoverability.

Another problem is becoming visible.
When the number of MCP servers grows too large, tool definitions for tools you’re not even using fill up the context.
Claude Code defaulting to shell-based CLI execution rather than MCP is likely because context efficiency wins for a general-purpose agent.
MCP is best connected only to “tool sets this specific agent will definitely use.”

In my own workflow, I default to CLI and switch to MCP only for tools where the model gets the syntax wrong.
There’s no reason to route git or docker through MCP, and conversely, letting the model assemble parameters for internal APIs or IoT device control via CLI alone leads to failures.
When in doubt, try CLI first, then MCP if accuracy falls short—that order is the most practical.

References