Letting AI Run E2E Tests Overnight While I Sleep
The dream: kick off E2E tests before bed and wake up to just the results. That’s what Claude Code’s official plugin “Ralph Wiggum” is for.
Ralph Wiggum adds a repetitive auto-loop capability to Claude Code. It keeps executing the same prompt until a completion condition is met, autonomously running and fixing tests along the way.
Prerequisites
- Development environment already set up (to avoid scenarios requiring admin privileges, like fresh installs)
- Tests can run inside Docker
- Claude Max or another flat-rate plan recommended (long runs can hit rate limits)
1. Network Restriction (Sandboxing the Machine)
When using --dangerously-skip-permissions, it’s safer to block external network access.
Linux / WSL2
# Allow only necessary outbound traffic
sudo iptables -A OUTPUT -d api.anthropic.com -j ACCEPT
sudo iptables -A OUTPUT -d statsig.anthropic.com -j ACCEPT
sudo iptables -A OUTPUT -d 127.0.0.0/8 -j ACCEPT # localhost
sudo iptables -A OUTPUT -d 172.16.0.0/12 -j ACCEPT # Docker network
sudo iptables -A OUTPUT -d 10.0.0.0/8 -j ACCEPT # Docker network (alternative)
sudo iptables -A OUTPUT -d 192.168.0.0/16 -j ACCEPT # host network
sudo iptables -A OUTPUT -j DROP # block everything else
# Verify
sudo iptables -L OUTPUT -n
To Revert
sudo iptables -F OUTPUT
To Make It Persistent
sudo apt install iptables-persistent
sudo netfilter-persistent save
2. Claude Code Configuration
.claude/settings.json
{
"permissions": {
"allow": [
"Bash(*)",
"Read(*)",
"Write(*)",
"Edit(*)",
"MultiEdit(*)"
]
}
}
If you still get permission prompts with this, use --dangerously-skip-permissions. Some environments and commands aren’t fully covered by the settings.json allow list, so using it with network restrictions in place is the practical approach.
3. Install the Ralph Wiggum Plugin
cd your-project
claude
# Inside Claude Code
/plugin install ralph-wiggum
4. Example Run Commands
Basic Form
claude --dangerously-skip-permissions
After Claude Code starts:
/ralph-loop "Run E2E tests one by one and fix failures.
## Rules
- Handle only one test file per iteration
- Check at most 2 screenshots per request
- Don't look at screenshots for passing tests
## Steps
1. Pick one failing test from tests/e2e/
2. Run only that test
3. If it fails, check the failure screenshot and fix
4. When it passes, move to the next test
5. When all tests pass, output DONE
## If Stuck (same test fails 3+ times)
- Record what's blocking
- Skip it and move to the next test
- List skipped tests at the end
DONE" --max-iterations 100 --completion-promise "DONE"
Run Headless (Continues After Terminal Closes)
nohup claude --dangerously-skip-permissions -p '/ralph-loop "..." --max-iterations 100' > ralph.log 2>&1 &
Or with tmux / screen:
tmux new -s ralph
claude --dangerously-skip-permissions
# After running /ralph-loop, detach with Ctrl+B D
5. Morning Check
# Check the log
tail -100 ralph.log
# Re-run tests to review results
# (replace with your project's test command)
6. Caveats
Context Length
Long runs may hit the context length limit. Since it resets to some degree between iterations, this usually isn’t a problem in practice.
Image Handling
Loading large numbers of images in a single request causes context to explode. Explicitly limit this in the prompt — something like “at most 2 screenshots.”
Infinite Loop Prevention
- Always set
--max-iterations - Include “skip if stuck” logic in the prompt
Production Database
Make absolutely sure the setup can never connect to a production database. Use a test database.
7. Troubleshooting
Stops with a Permission Prompt
Use --dangerously-skip-permissions. Safe to use with network restrictions in place.
Can’t Connect to the Claude API with Network Restrictions
# Add anthropic.com to the allow list
sudo iptables -I OUTPUT 1 -d api.anthropic.com -j ACCEPT
sudo iptables -I OUTPUT 2 -d statsig.anthropic.com -j ACCEPT
Can’t Connect to Docker
# Check the Docker network
docker network inspect bridge
# Add that subnet to the allow list
References
Trying this out now — if it doesn’t stall, I’ll report back.