Tech 4 min read

agent-browser: a browser automation CLI for AI agents

IkesanContents

I came across agent-browser, a tool released by Vercel Labs on January 11, 2026. It is a browser automation CLI designed for AI agents, and it looked lighter than Playwright MCP, so I took a closer look.

Basic information

  • Repository: vercel-labs/agent-browser
  • Version: 0.4.0 as of January 12, 2026
  • License: Apache-2.0
  • Dependency: playwright-core ^1.57.0

Installation

npm install -g agent-browser
agent-browser install

On Linux you also need system dependencies:

agent-browser install --with-deps

Architecture

It uses a two-layer structure: a Rust CLI on top of a Node.js daemon.

Rust CLI (command parsing)
    ->
Node.js daemon (Playwright management)
    ->
Chromium (actual browser control)

If a Rust binary is not available, it falls back to a pure Node.js implementation. The daemon starts automatically on the first command, which makes later operations faster.

Basic usage

# Open a page
agent-browser open example.com

# Get the accessibility tree
agent-browser snapshot

# Click an element by ref
agent-browser click @e2

# Fill a form
agent-browser fill @e3 "test@example.com"

# Take a screenshot
agent-browser screenshot page.png

# Close the browser
agent-browser close

Core idea: snapshot plus ref

The most important command is snapshot. It captures the page’s accessibility tree and gives each element a ref ID.

$ agent-browser snapshot -i
- heading "Example Domain" [ref=e1] [level=1]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
- link "Learn more" [ref=e4]

With -i, the output is limited to interactive elements such as buttons, links, and form inputs.

This is why the tool fits AI agents well. Instead of relying on CSS selectors or XPath, the agent can act on a stable reference derived from the accessibility tree. Even if the DOM changes, the accessibility structure is often more stable.

Snapshot options

OptionMeaning
-i, --interactiveInteractive elements only
-c, --compactRemove empty structural nodes
-d, --depth <n>Limit tree depth
-s, --selector <sel>Scope the snapshot to a selector

Using it with Claude Code

Combined with Claude Code skills, this makes /browse https://example.com style commands possible.

Example .claude/skills/browse/SKILL.md:

---
permissionMode: bypassPermissions
tools: Bash
model: claude-haiku-4-5-20251001
---

# Fetch a page with agent-browser

## Arguments: $ARGUMENTS
Format: `URL [question]`

## Steps

agent-browser open "[URL]"
agent-browser snapshot -i -c
agent-browser close

Answer the question based on the snapshot output.

When you need to fill forms, you can fetch refs in JSON and then act on those refs:

agent-browser snapshot -i --json
agent-browser fill @e3 "value"
agent-browser click @e5

Current caveats

It has just been released, so there are still rough edges.

The session command does not work yet

The session command, which is supposed to manage multiple browser instances, currently throws an unimplemented error. For now, the safe pattern is to close and then open again.

There is no official skill definition yet

There is no official SKILL.md yet for Claude Code or OpenCode, so you have to write your own wrapper.

Comparison with Playwright MCP

Itemagent-browserPlaywright MCP
Setupnpm install onlyMCP setup required
StartupDirect CLI callVia MCP protocol
OutputText / JSONMCP format
AI workflowsnapshot + refSnapshot-based
MaturityFresh releaseStable

The main advantage of agent-browser is that it works without MCP configuration. On the other hand, Playwright MCP is ahead in IDE integration.

Summary

agent-browser is a browser automation CLI built specifically for AI agents, and its defining feature is element selection through snapshot plus ref.

It is still new and not fully polished, but Vercel Labs is behind it, so it is worth watching. If Playwright MCP setup feels heavy and you just want browser control from a CLI, this is already a solid option.