Tech Apr 19, 2026 13 min Zero-copy GPU inference on Apple Silicon with WebAssembly and Metal A three-link chain of mmap → MTLBuffer(bytesNoCopy) → Wasmtime MemoryCreator that makes a Wasm linear memory share the same physical bytes as a Metal GPU buffer. Llama 3.2 1B runs at 9ms/token on M1. WebAssembly Metal AppleSilicon MLX Wasmtime LLM