#MLX

14 articles

TechApr 2, 2026updated13 min

SwiftLM is a Swift-based LLM inference server that integrates TurboQuant and SSD streaming into Metal shaders

SwiftLM, an Apple Silicon–only MLX inference server, provides a native Metal implementation of TurboQuant V2+V3 hybrid KV‑cache compression and NVMe SSD expert streaming.

Apple Silicon LLM MLX Local LLM Inference Optimization KV Cache MoE Swift

TechMar 31, 20266 min

Ollama Moves to MLX Backend, Dramatically Speeds Up Local Inference on Apple Silicon

Ollama 0.19 switches the Apple Silicon backend to MLX, achieving 1,810 tokens/s prefill and 112 tokens/s decode. NVFP4 quantization support and cache improvements landed at the same time.

Ollama MLX Apple Silicon LLM Local LLM Inference Optimization