Flash-MoE is a C/Metal inference engine that runs Qwen3.5-397B-A17B on a MacBook Pro M3 Max at 4.36 tokens/s. With expert streaming from SSD and hand-written Metal shaders, it fits the 209GB model into a 48GB memory budget.
Upscaling images loaded via the Load Image node was producing garbled output. Fixed it by addressing the non-contiguous tensor issue — a one-line patch to comfy/utils.py.