Tech 8 min read

Local isolation execution of AI agent, what is the difference between macOS sandbox-exec and Windows sandbox

IkesanContents

Different approaches took shape around the same time on macOS and Windows to answer the question, “What if something weird happens?” when running an AI coding agent with the --dangerously-skip-permissions flag. On macOS, Agent Safehouse uses the OS native sandbox-exec, and on Windows, OpenAI has officially released Codex for Windows to take advantage of the Windows sandbox. Although both have the same goal of preventing destructive operations by AI agents, the mechanism and strength of isolation are fundamentally different.

Honestly, I know I should use Docker. But Docker is troublesome. When it comes to rebuilding the development environment in the container, setting up volume mounts, adjusting permissions, etc., I end up adding --dangerously-skip-permissions as is because I just want to try it out. This is especially true when leaving it overnight with tmux to run an automatic development loop, and in Ralph Wiggum plugin article, I created a sandbox by applying network restrictions with iptables, but I was not able to isolate the file system.

And accidents do happen. An incident where Amazon’s Kiro “deleted and recreated” the production environment resulting in 13 hours of downtime, An incident where Replit’s AI agent deleted the customer DB. Both cases were a direct result of giving too broad authority to agents. The answer to what awaits us beyond the decision to “set a flag because permission settings are troublesome” has already been given.

macOS: Agent Safehouse and OS built-in sandbox-exec

What is macOS sandbox-exec?

Agent Safehouse is based on sandbox-exec built into macOS. sandbox-exec is a kernel-level feature that uses macOS’s Sandbox.framework (also known as Seatbelt) to constrain process behavior, and is the same mechanism as the App Store app sandbox.

It works by writing a rule in a policy file that says “deny everything except what is allowed,” and starting the target process under that policy. The kernel blocks operations that attempt to violate the policy at the system call level. Unlike approaches like userspace hooks or LD_PRELOAD, there is no way to avoid this from the process side.

security model

By default, Agent Safehouse allows agents to access only the following:

Access typePermission details
Read and writeSelected working directory (default is git root)
Read onlyInstalled toolchains (npm, pip, cargo, etc.)
BlockSSH keys, repositories outside your home directory, system settings

Commands like rm -rf ~ are rejected by the kernel before they are executed. No matter how many “correct shell commands” an agent generates, it cannot overcome the kernel wall.

Installation and usage

It has zero dependencies and runs with a single Bash script.

curl -fsSL https://raw.githubusercontent.com/eugene1g/agent-safehouse/main/dist/safehouse.sh \
  -o ~/.local/bin/safehouse
chmod +x ~/.local/bin/safehouse

To use it, just put safehouse in front of the existing agent command.

cd ~/projects/my-app
safehouse claude --dangerously-skip-permissions
safehouse codex

You can also set shell functions to override agent commands and always run them in a sandbox.

safe() { safehouse --add-dirs-ro=~/mywork "$@"; }
claude() { safe claude --dangerously-skip-permissions "$@"; }

Now, just by typing claude, it will automatically start under sandbox.

Compatible agent

The tested agents are listed below.

Claude Code, Codex, OpenCode, Amp, Gemini CLI, Aider, Goose, Auggie, Pi, Cursor Agent, Cline, Kilo Code, Droid

“Tested” means confirmation that the coding work could actually be completed under a sandbox. No matter what sub-processes an agent launches within it, they inherit the policies of the parent process.

Customize policies

Multiple policy profiles are stored in the profiles/ directory and are designed to be used in combination. Read access to additional directories can be specified with the --add-dirs-ro flag.

For machine-specific settings, specify local override files using environment variables so that they do not pollute the shared repository.

SAFEHOUSE_APPEND_PROFILE="$HOME/.config/agent-safehouse/local-overrides.sb"

It also provides the ability for LLMs such as Claude to automatically generate individual sandbox profiles by reading home directories and toolchain configurations. This is an interesting approach that leaves the decision of “which directories are allowed” to LLM.

Agent Safehouse is published under the Apache 2.0 license and already has over 500 stars on GitHub.

Windows: Codex for Windows and Windows Sandbox

Isolation execution using Windows Sandbox

On March 9, 2026, OpenAI officially released the Windows compatible version of its AI coding agent “Codex.” The biggest feature is that the agent runs in a Windows sandbox by default.

Windows Sandbox is a lightweight virtualization feature built into Windows 10 Pro/Enterprise and later that launches a desktop session completely isolated from the host environment. Closing a session discards all internal changes. Provides a kernel-level guarantee that even if an AI agent unintentionally rewrites system files or corrupts settings, the host environment will not be affected.

OpenAI has published the code for this sandbox implementation on GitHub, allowing developers to check and modify its internal operations.

WSL support and PowerShell integration

Codex for Windows also supports WSL. It’s an option for developers who want to use Linux-based toolchains and Shell scripts as-is, and can be used alongside Windows-native PowerShell.

Skill sets for Windows native applications are also available, enabling agent operations to support development tasks specific to the Windows ecosystem, such as Win32 APIs, .NET, and UWP (Universal Windows Platform).

Project management with GUI

Previously, Codex was mainly command line-based, but Codex for Windows has a GUI as a Windows application. You can manage a list of multiple projects and check the agent’s task status on the screen. Instructions to the agent can be given in natural language such as modify the authentication logic in this file'' or add a unit test to this component,” and the agent will autonomously perform coding, debugging, and reviews.

Comparison of the two approaches

ItemAgent Safehouse (macOS)Codex for Windows
How isolation worksKernel-level sandbox-exec (Seatbelt)Lightweight VM (Windows Sandbox)
Post-session stateChanges to authorized directories remainDiscarded (disappears when VM is closed)
OverheadSmall (native API)VM startup cost included
Strength of isolationProcess levelVM boundaries (stronger isolation)
Compatible agents12 or more (Claude Code, Codex, etc.)Codex (in-house products only)
Policy settingsUsers can customize profilesFixed by OpenAI
LicenseApache 2.0 (OSS)Closed

sandbox-exec is unique in that it uses macOS’s native API without using a VM, and although it has less overhead, its isolation strength is lower than that of a VM. Windows Sandbox has a clear advantage in “trying out an experimental task and resetting everything” because all state is discarded when the session ends.

Both methods are effective for the purpose of “preventing destructive operations,” but the trade-off between strength and convenience is different. If you use it regularly in your macOS development workflow, you will benefit from the lightweight nature of Agent Safehouse. If you want to try new things on Windows or conduct experiments that you don’t mind breaking, complete separation of the Windows Sandbox is suitable.

There are some attacks that cannot be prevented by file system isolation alone.

The story so far is about isolation to prevent destructive operations on the agent itself (rm -rf and deleting the production environment). However, in 2026, cases of agents being manipulated by outsiders are rapidly increasing.

In Clinejection, a prompt injection embedded in a GitHub issue title stole npm tokens via Cline’s AI triage bot and automatically installed malicious packages on 4,000 development machines. In SANDWORM_MODE campaign (a series of attack activities), fake npm packages disguised as Claude Code and Cursor steal SSH keys and API tokens. AMOS distribution via OpenClaw’s SKILL.md involved an AI agent running the malware installer itself according to the instructions in the skill file.

Additionally, cases have been reported where the agent’s memory file itself becomes the target of attack. This is a method of rewriting the agent’s behavior itself through configuration files such as CLAUDE.md and cursorrules.

Agent Safehouse’s network restrictions and file access restrictions provide some protection against such attacks. If access to the SSH key is blocked, it cannot be stolen, and if writing outside the permitted directory is prohibited, the agent cannot perform a global installation without permission. However, it cannot prevent malicious dependencies from being added to the code in the working directory. Ultimately, code reviews and dependency monitoring will be necessary.

There is an approach to fine-grained access control using Claude Code permission settings, but at the stage where the official statement says “it can be bypassed,” it is more realistic to rely on the enforcement power of the OS side.