Tested Klein 9B + 9B NSFW LoRA on M1 Max 64GB via mflux 0.17.5: 1m51s/512, 5m37s/1024 q4, 224/224 LoRA keys match, NSFW prompts uncensored, Japanese subjects work with helper tokens.
Investigated whether NSFW LoRAs for FLUX.2 Klein 9B can run on M1 Max 64GB. Covers model compatibility, LoRA application paths, RunPod verification strategy, and VRAM requirements for training your own LoRA with ai-toolkit.
Three local image generation engines (WAI-Anima, WAI-IL/SDXL, FLUX.2 Klein 4B) tied together by a thin FastAPI wrapper that takes Japanese prompts. Ollama (gemma3:12b) handles JP→EN, ComfyUI workflows are built on the fly in Python, FLUX.2 runs as an mflux subprocess, and the whole thing is reachable from an iPhone over Tailscale.
Hands-on benchmark of FLUX.2 Klein 4B on M1 Max 64GB using mflux (MLX) and iris.c (pure C + Metal). A counter to Pruna AI's H100-only tutorial — measuring how fast Apple Silicon actually gets there.
Technical details of UltraFlux-v1, a model that pushes FLUX.1-dev into native 4K generation. It covers the differences from Z-Image and FLUX.2 Klein, its RoPE extensions and VAE improvements, and practical caveats.
Overview of Black Forest Labs' FLUX.2 Klein 9B model and how it performs on M1/M2/M3/M4 Macs. Covers the key factors behind the CUDA vs MPS performance gap, including memory bandwidth and FP8 quantization.
Overview of Alibaba’s Z-Image and how it compares to FLUX and Stable Diffusion. A 6B-parameter model that runs on low VRAM and ranks first among open-source models.