Tech Mar 31, 2026 6 min Ollama Moves to MLX Backend, Dramatically Speeds Up Local Inference on Apple Silicon Ollama 0.19 switches the Apple Silicon backend to MLX, achieving 1,810 tokens/s prefill and 112 tokens/s decode. NVFP4 quantization support and cache improvements landed at the same time. Ollama MLX Apple Silicon LLM Local LLM Inference Optimization