agent-browser: a browser automation CLI for AI agents
Contents
I came across agent-browser, a tool released by Vercel Labs on January 11, 2026. It is a browser automation CLI designed for AI agents, and it looked lighter than Playwright MCP, so I took a closer look.
Basic information
- Repository: vercel-labs/agent-browser
- Version: 0.4.0 as of January 12, 2026
- License: Apache-2.0
- Dependency:
playwright-core ^1.57.0
Installation
npm install -g agent-browser
agent-browser install
On Linux you also need system dependencies:
agent-browser install --with-deps
Architecture
It uses a two-layer structure: a Rust CLI on top of a Node.js daemon.
Rust CLI (command parsing)
->
Node.js daemon (Playwright management)
->
Chromium (actual browser control)
If a Rust binary is not available, it falls back to a pure Node.js implementation. The daemon starts automatically on the first command, which makes later operations faster.
Basic usage
# Open a page
agent-browser open example.com
# Get the accessibility tree
agent-browser snapshot
# Click an element by ref
agent-browser click @e2
# Fill a form
agent-browser fill @e3 "test@example.com"
# Take a screenshot
agent-browser screenshot page.png
# Close the browser
agent-browser close
Core idea: snapshot plus ref
The most important command is snapshot. It captures the page’s accessibility tree and gives each element a ref ID.
$ agent-browser snapshot -i
- heading "Example Domain" [ref=e1] [level=1]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
- link "Learn more" [ref=e4]
With -i, the output is limited to interactive elements such as buttons, links, and form inputs.
This is why the tool fits AI agents well. Instead of relying on CSS selectors or XPath, the agent can act on a stable reference derived from the accessibility tree. Even if the DOM changes, the accessibility structure is often more stable.
Snapshot options
| Option | Meaning |
|---|---|
-i, --interactive | Interactive elements only |
-c, --compact | Remove empty structural nodes |
-d, --depth <n> | Limit tree depth |
-s, --selector <sel> | Scope the snapshot to a selector |
Using it with Claude Code
Combined with Claude Code skills, this makes /browse https://example.com style commands possible.
Example .claude/skills/browse/SKILL.md:
---
permissionMode: bypassPermissions
tools: Bash
model: claude-haiku-4-5-20251001
---
# Fetch a page with agent-browser
## Arguments: $ARGUMENTS
Format: `URL [question]`
## Steps
agent-browser open "[URL]"
agent-browser snapshot -i -c
agent-browser close
Answer the question based on the snapshot output.
When you need to fill forms, you can fetch refs in JSON and then act on those refs:
agent-browser snapshot -i --json
agent-browser fill @e3 "value"
agent-browser click @e5
Current caveats
It has just been released, so there are still rough edges.
The session command does not work yet
The session command, which is supposed to manage multiple browser instances, currently throws an unimplemented error. For now, the safe pattern is to close and then open again.
There is no official skill definition yet
There is no official SKILL.md yet for Claude Code or OpenCode, so you have to write your own wrapper.
Comparison with Playwright MCP
| Item | agent-browser | Playwright MCP |
|---|---|---|
| Setup | npm install only | MCP setup required |
| Startup | Direct CLI call | Via MCP protocol |
| Output | Text / JSON | MCP format |
| AI workflow | snapshot + ref | Snapshot-based |
| Maturity | Fresh release | Stable |
The main advantage of agent-browser is that it works without MCP configuration. On the other hand, Playwright MCP is ahead in IDE integration.
Summary
agent-browser is a browser automation CLI built specifically for AI agents, and its defining feature is element selection through snapshot plus ref.
It is still new and not fully polished, but Vercel Labs is behind it, so it is worth watching. If Playwright MCP setup feels heavy and you just want browser control from a CLI, this is already a solid option.