#AppleSilicon

3 articles

TechApr 21, 2026updated19 min

TRELLIS.2 trellis-mac port tested on M1 Max 64GB: setup, generation time, MPS bottlenecks

Hands-on run of trellis-mac (the CUDA-free port of TRELLIS.2) on M1 Max 64GB. Setup via uv with PyTorch 2.11.0 MPS, applied mps_compat.py patches, and recorded actual generation time vs the M4 Pro 24GB 3.5-minute reference, plus where the bottlenecks land on Apple Silicon.

AppleSilicon MPS PyTorch 3D ML 実験

TechApr 20, 2026updated9 min

Running TRELLIS.2 on Apple Silicon MPS: a CUDA-free port

A port that replaces TRELLIS.2's CUDA-only libraries (flash_attn, nvdiffrast, sparse 3D convolution) with pure-PyTorch equivalents and runs Microsoft's 4B image-to-3D model on an M4 Pro in about 3.5 minutes without any NVIDIA GPU.

AppleSilicon MPS PyTorch 3D ローカルLLM ML

TechApr 19, 202613 min

Zero-copy GPU inference on Apple Silicon with WebAssembly and Metal

A three-link chain of mmap → MTLBuffer(bytesNoCopy) → Wasmtime MemoryCreator that makes a Wasm linear memory share the same physical bytes as a Metal GPU buffer. Llama 3.2 1B runs at 9ms/token on M1.

WebAssembly Metal AppleSilicon MLX Wasmtime LLM