Feeding inclusionAI's Ling-flash-2.0 (bailing_moe, 100B / 6.1B active, MXFP4 quantization) into SwiftLM on an M1 Max 64GB. Covers the mlx-swift-lm bailing_moe and MXFP4 support check, the startup surprise, and what --stream-experts actually does.
A hands-on build and run of the Swift-based LLM inference server SwiftLM on an M1 Max 64GB. Covers Qwen3.6-35B-A3B and Qwen3.5-122B-A10B, with the same BST, BBS, and persona tests used in the existing Ollama and MLX-lm write-ups.