Shortest - an AI tool for writing E2E tests in natural language
Contents
I found an AI testing tool called “Shortest” that has been trending on GitHub. As a Playwright user, I wanted to check it out.
My conclusion: it is interesting, but it is not the same thing as Playwright. Choosing the right tool matters.
What Shortest Is
Shortest is a natural-language E2E testing framework that uses the Anthropic Claude API. It is built on top of Playwright, and it lets you write tests in natural language while AI converts them into actual browser actions.
It is developed by antiwork and is actively maintained on GitHub.
The most distinctive part is that tests can be written like this:
import { shortest } from "@antiwork/shortest";
shortest("Login to the app using email and password", {
username: process.env.USERNAME,
password: process.env.PASSWORD,
});
You do not write a single selector. You just tell it, “log in.” AI analyzes the UI, finds the right elements, and performs the actions.
Setup
Installation
npx @antiwork/shortest init
This command automatically does the following:
- Installs
@antiwork/shortestas a dev dependency - Generates a
shortest.config.tsfile - Creates a
.env.localfile
Environment variables
An Anthropic API key is required.
ANTHROPIC_API_KEY=your_api_key
Configuration file
import type { ShortestConfig } from "@antiwork/shortest";
export default {
headless: false,
baseUrl: "http://localhost:3000",
testPattern: "**/*.test.ts",
ai: {
provider: "anthropic",
},
} satisfies ShortestConfig;
Basic Usage
A simple test
import { shortest } from "@antiwork/shortest";
shortest("Login to the app using email and password", {
username: process.env.USERNAME,
password: process.env.PASSWORD,
});
By passing variables, you give AI enough context to use the right values.
Additional checks in a callback
If you are not fully comfortable relying on AI actions alone, you can add explicit assertions in a callback.
shortest("Add item to cart and proceed to checkout", {
productName: "Test Product"
}).after(async ({ page }) => {
const cartCount = await page.locator('.cart-count').textContent();
expect(cartCount).toBe('1');
});
In the after callback, you can access Playwright’s page object and write normal Playwright code.
Lifecycle hooks
Jest-like hooks are also available.
shortest.beforeAll(async ({ page }) => {
await page.goto('/setup');
});
shortest.afterEach(async ({ page }) => {
await page.evaluate(() => localStorage.clear());
});
Running tests
pnpm shortest # Run all tests
pnpm shortest login.test.ts # Run a specific file
pnpm shortest login.test.ts:23 # Run a specific line
pnpm shortest --headless # Headless mode for CI
Comparison with Playwright
Difference in test authoring
Here is what the same login test looks like.
Playwright
test('login test', async ({ page }) => {
await page.goto('https://example.com/login');
await page.fill('#email', 'user@example.com');
await page.fill('#password', 'password123');
await page.click('button[type="submit"]');
await page.waitForURL('**/dashboard');
await expect(page.locator('h1')).toContainText('Welcome');
});
Shortest
shortest("Login with email user@example.com and password, then verify dashboard shows Welcome message");
The difference in code volume is obvious. But that does not mean it is all upside.
Comparison table
| Item | Playwright | Shortest |
|---|---|---|
| Test authoring | Code with selectors | Natural language (English) |
| Learning curve | You need to learn the API | Just write in English |
| Execution cost | Free | Anthropic API billing |
| Fine-grained control | High | Depends on AI |
| Reproducibility | Exactly the same every time | Slightly different based on AI judgment |
| Tolerance for selector changes | Weak (requires fixes) | Strong (AI adapts) |
| Ease of debugging | High | Moderate |
| CI/CD suitability | Excellent | Good, but watch the cost |
Playwright MCP vs Shortest
If you are already using Playwright MCP to control a browser from AI, the difference from Shortest is worth understanding.
Different approaches
Playwright MCP
AI directly calls the Playwright API. The AI decides both what to do and how to do it. It is flexible, but the AI makes judgment calls every time.
Shortest
Humans write “what to do” in natural language, and AI decides only “how to do it.” The test itself is fixed; AI only determines the execution details.
When to use which
| Scenario | Playwright MCP | Shortest |
|---|---|---|
| Free-form browser exploration | Excellent | Poor |
| Repeating a fixed test | Fair (AI decides every time) | Excellent (test is predefined) |
| Managing as test assets | Fair | Excellent |
| Dynamic task execution | Excellent | Poor |
A combined workflow
One possible workflow looks like this:
- Use Playwright MCP to explore the UI interactively
- Turn the confirmed flow into a Shortest test
- Run Shortest in CI/CD
Use MCP’s flexibility during exploration, and turn stable flows into test assets with Shortest.
Caveats and pitfalls
API charges apply
Every test run calls the Claude API. The cost per test depends on how complex the operation is, but if you run it repeatedly during development, the bill can grow quickly.
Playwright is completely free, so this difference matters.
Countermeasure: Write tests with Playwright during development and consider switching to Shortest once they stabilize. Or use Shortest only for final verification.
It depends on AI judgment
Even for the same test, the executed steps may vary slightly from run to run. If you tell it to “click the login button,” one run might click #login-btn, while another might click .submit-button.
It is resilient to UI changes, but there is also a risk of unintended actions.
Countermeasure: Add explicit assertions in callback functions for the important parts. Trust the AI for the action, but verify the final state yourself.
Type errors with React 18 + Next.js 14+
You may run into type conflicts with useFormStatus and Server Actions.
Type error: Type 'X' is not assignable to type 'Y'
Countermeasure: Upgrading to React 19 is recommended.
Debug data may contain sensitive information
Debug information is stored in the .shortest folder. Check its contents before sharing it in an issue report, because it may include environment variables or login information.
English is assumed
Natural-language tests are intended to be written in English. Japanese support is not officially documented.
Claude can understand Japanese, so it may still work, but that is unverified. If you are using it in production, writing in English is the safer choice.
GitHub 2FA setup can be complicated
If you use GitHub authentication tests, you need to configure a TOTP secret key. Syncing it with an authenticator app can be annoying.
GITHUB_TOTP_SECRET=your_secret
It is not suited to complex tests
Shortest struggles when the AI has to make difficult judgments or when strict timing control is needed.
Countermeasure: Use Playwright directly for those cases. There is no need to force Shortest everywhere.
How to use it by scenario
Simple prototype tests
Great for smoke tests early in development. It is useful when the UI changes frequently and you do not want to write selectors yet.
Tests that non-engineers can read
QA teams and PMs can understand the tests more easily. Since they are written in natural language, they also double as test specifications.
Authentication flow tests
GitHub 2FA support is built in. Mailosaur integration also supports email verification. Authentication tests are usually a pain to set up, and Shortest covers this area pretty thoroughly.
Current evaluation
Good points
- A breakthrough approach that lets you write tests in natural language
- You can write E2E tests without knowing Playwright well
- Strong against UI changes because it does not depend on selectors
- Built-in support for GitHub 2FA, email verification, and similar features
Concerns
- API billing applies (Playwright is free)
- Reproducibility depends on AI
- Assumes English
- Not suitable for complex tests
How I would use it
The right way to think about Shortest is not as a “replacement for Playwright” but as a “higher layer on top of Playwright.”
- Need fine-grained control -> Playwright
- Want to test a simple flow quickly -> Shortest
- Want free-form browser exploration -> Playwright MCP
- Want to turn a stable flow into a test asset -> Shortest
The best approach is to use both appropriately. There is no need to replace everything with Shortest, and it is perfectly fine not to use it at all.
Use tools where they make sense.