A port that replaces TRELLIS.2's CUDA-only libraries (flash_attn, nvdiffrast, sparse 3D convolution) with pure-PyTorch equivalents and runs Microsoft's 4B image-to-3D model on an M4 Pro in about 3.5 minutes without any NVIDIA GPU.
A three-link chain of mmap → MTLBuffer(bytesNoCopy) → Wasmtime MemoryCreator that makes a Wasm linear memory share the same physical bytes as a Metal GPU buffer. Llama 3.2 1B runs at 9ms/token on M1.