Shortest - an AI tool for writing E2E tests in natural language

I found an AI testing tool called “Shortest” that has been trending on GitHub. As a Playwright user, I wanted to check it out.

My conclusion: it is interesting, but it is not the same thing as Playwright. Choosing the right tool matters.

What Shortest Is

Shortest is a natural-language E2E testing framework that uses the Anthropic Claude API. It is built on top of Playwright, and it lets you write tests in natural language while AI converts them into actual browser actions.

It is developed by antiwork and is actively maintained on GitHub.

The most distinctive part is that tests can be written like this:

import { shortest } from "@antiwork/shortest";

shortest("Login to the app using email and password", {
  username: process.env.USERNAME,
  password: process.env.PASSWORD,
});

You do not write a single selector. You just tell it, “log in.” AI analyzes the UI, finds the right elements, and performs the actions.

Setup

Installation

npx @antiwork/shortest init

This command automatically does the following:

Installs @antiwork/shortest as a dev dependency
Generates a shortest.config.ts file
Creates a .env.local file

Environment variables

An Anthropic API key is required.

ANTHROPIC_API_KEY=your_api_key

Configuration file

import type { ShortestConfig } from "@antiwork/shortest";

export default {
  headless: false,
  baseUrl: "http://localhost:3000",
  testPattern: "**/*.test.ts",
  ai: {
    provider: "anthropic",
  },
} satisfies ShortestConfig;

Basic Usage

A simple test

import { shortest } from "@antiwork/shortest";

shortest("Login to the app using email and password", {
  username: process.env.USERNAME,
  password: process.env.PASSWORD,
});

By passing variables, you give AI enough context to use the right values.

Additional checks in a callback

If you are not fully comfortable relying on AI actions alone, you can add explicit assertions in a callback.

shortest("Add item to cart and proceed to checkout", {
  productName: "Test Product"
}).after(async ({ page }) => {
  const cartCount = await page.locator('.cart-count').textContent();
  expect(cartCount).toBe('1');
});

In the after callback, you can access Playwright’s page object and write normal Playwright code.

Lifecycle hooks

Jest-like hooks are also available.

shortest.beforeAll(async ({ page }) => {
  await page.goto('/setup');
});

shortest.afterEach(async ({ page }) => {
  await page.evaluate(() => localStorage.clear());
});

Running tests

pnpm shortest                   # Run all tests
pnpm shortest login.test.ts     # Run a specific file
pnpm shortest login.test.ts:23  # Run a specific line
pnpm shortest --headless        # Headless mode for CI

Comparison with Playwright

Difference in test authoring

Here is what the same login test looks like.

Playwright

test('login test', async ({ page }) => {
  await page.goto('https://example.com/login');
  await page.fill('#email', 'user@example.com');
  await page.fill('#password', 'password123');
  await page.click('button[type="submit"]');
  await page.waitForURL('**/dashboard');
  await expect(page.locator('h1')).toContainText('Welcome');
});

Shortest

shortest("Login with email user@example.com and password, then verify dashboard shows Welcome message");

The difference in code volume is obvious. But that does not mean it is all upside.

Comparison table

Item	Playwright	Shortest
Test authoring	Code with selectors	Natural language (English)
Learning curve	You need to learn the API	Just write in English
Execution cost	Free	Anthropic API billing
Fine-grained control	High	Depends on AI
Reproducibility	Exactly the same every time	Slightly different based on AI judgment
Tolerance for selector changes	Weak (requires fixes)	Strong (AI adapts)
Ease of debugging	High	Moderate
CI/CD suitability	Excellent	Good, but watch the cost

Playwright MCP vs Shortest

If you are already using Playwright MCP to control a browser from AI, the difference from Shortest is worth understanding.

Different approaches

Playwright MCP

AI directly calls the Playwright API. The AI decides both what to do and how to do it. It is flexible, but the AI makes judgment calls every time.

Shortest

Humans write “what to do” in natural language, and AI decides only “how to do it.” The test itself is fixed; AI only determines the execution details.

When to use which

Scenario	Playwright MCP	Shortest
Free-form browser exploration	Excellent	Poor
Repeating a fixed test	Fair (AI decides every time)	Excellent (test is predefined)
Managing as test assets	Fair	Excellent
Dynamic task execution	Excellent	Poor

A combined workflow

One possible workflow looks like this:

Use Playwright MCP to explore the UI interactively
Turn the confirmed flow into a Shortest test
Run Shortest in CI/CD

Use MCP’s flexibility during exploration, and turn stable flows into test assets with Shortest.

Caveats and pitfalls

API charges apply

Every test run calls the Claude API. The cost per test depends on how complex the operation is, but if you run it repeatedly during development, the bill can grow quickly.

Playwright is completely free, so this difference matters.

Countermeasure: Write tests with Playwright during development and consider switching to Shortest once they stabilize. Or use Shortest only for final verification.

It depends on AI judgment

Even for the same test, the executed steps may vary slightly from run to run. If you tell it to “click the login button,” one run might click #login-btn, while another might click .submit-button.

It is resilient to UI changes, but there is also a risk of unintended actions.

Countermeasure: Add explicit assertions in callback functions for the important parts. Trust the AI for the action, but verify the final state yourself.

Type errors with React 18 + Next.js 14+

You may run into type conflicts with useFormStatus and Server Actions.

Type error: Type 'X' is not assignable to type 'Y'

Countermeasure: Upgrading to React 19 is recommended.

Debug data may contain sensitive information

Debug information is stored in the .shortest folder. Check its contents before sharing it in an issue report, because it may include environment variables or login information.

English is assumed

Natural-language tests are intended to be written in English. Japanese support is not officially documented.

Claude can understand Japanese, so it may still work, but that is unverified. If you are using it in production, writing in English is the safer choice.

GitHub 2FA setup can be complicated

If you use GitHub authentication tests, you need to configure a TOTP secret key. Syncing it with an authenticator app can be annoying.

GITHUB_TOTP_SECRET=your_secret

It is not suited to complex tests

Shortest struggles when the AI has to make difficult judgments or when strict timing control is needed.

Countermeasure: Use Playwright directly for those cases. There is no need to force Shortest everywhere.

How to use it by scenario

Simple prototype tests

Great for smoke tests early in development. It is useful when the UI changes frequently and you do not want to write selectors yet.

Tests that non-engineers can read

QA teams and PMs can understand the tests more easily. Since they are written in natural language, they also double as test specifications.

Authentication flow tests

GitHub 2FA support is built in. Mailosaur integration also supports email verification. Authentication tests are usually a pain to set up, and Shortest covers this area pretty thoroughly.

Current evaluation

Good points

A breakthrough approach that lets you write tests in natural language
You can write E2E tests without knowing Playwright well
Strong against UI changes because it does not depend on selectors
Built-in support for GitHub 2FA, email verification, and similar features

Concerns

API billing applies (Playwright is free)
Reproducibility depends on AI
Assumes English
Not suitable for complex tests

How I would use it

The right way to think about Shortest is not as a “replacement for Playwright” but as a “higher layer on top of Playwright.”

Need fine-grained control -> Playwright
Want to test a simple flow quickly -> Shortest
Want free-form browser exploration -> Playwright MCP
Want to turn a stable flow into a test asset -> Shortest

The best approach is to use both appropriately. There is no need to replace everything with Shortest, and it is perfectly fine not to use it at all.

Use tools where they make sense.