FLUX.2 Klein 4B benchmarked on M1 Max with mflux vs iris.c

H100-only tutorial sent me looking elsewhere

Pruna AI published a tutorial that promises a 2–3× speedup on FLUX.2 Klein 4B. I went to try it and found out the FP8 quantization requires compute capability ≥ 8.9 — H100-class hardware. Nothing for an M1 Max to run.

So instead I tested how far the Apple Silicon side of the ecosystem can go. Two candidates:

mflux (filipstrand/mflux): an MLX-based Python implementation. Officially supports the FLUX.2 Klein family.
iris.c (antirez/iris.c): a pure C implementation with Metal acceleration. Side project from the author of Redis.

Earlier I wrote FLUX.2 Klein — a 9B-parameter image model and how it does on Apple Silicon but only as a survey, no real numbers. This is the follow-up with measurements.

Environment

Item	Value
Machine	M1 Max 64GB
OS	macOS (Darwin 25.3)
Python	3.13.11 (miniconda)
Date	2026-04-30

What tripped me up installing mflux

pip install mflux

This installs mflux 0.17.5. Looking at the CLI surface, both mflux-generate and mflux-generate-flux2 exist.

I first tried mflux-generate --model flux2-klein-4b. The model downloaded fine, then this exploded:

FileNotFoundError: No safetensors files found in
  .../models--black-forest-labs--FLUX.2-klein-4B/snapshots/.../text_encoder_2

FLUX.1 used a two-stage CLIP + T5 setup with text_encoder and text_encoder_2. FLUX.2 Klein switched to Qwen3 alone, so text_encoder_2 doesn’t exist in the repo. The CLI accepts flux2-klein-4b as --model, but the call routes to the FLUX.1 pipeline internally and dies on the structural mismatch.

The right command is mflux-generate-flux2 — a separate CLI dedicated to the FLUX.2 family.

mflux-generate-flux2 \
  --model flux2-klein-4b \
  --prompt "a serene Japanese garden in autumn, koi pond, maple leaves" \
  --steps 4 --width 1024 --height 1024 \
  --seed 42 --output out.png

Both CLIs accept the same flux2-klein-4b for --model in their help text, which is misleading. For the FLUX.2 family, always reach for the -flux2 suffix.

mflux inference speed

4-step distilled, seed 42, identical prompt across resolutions.

Resolution	Inference (pure)	Wall time
512×512	12.0s	23.7s (first run, includes extra component download)
1024×1024	21.4s	31.7s

Sub-30s for 1024×1024. My earlier survey article had pessimistically estimated “3–5 minutes” — way off.

Sample output (1024×1024, 4-step, seed 42).

mflux FLUX.2 Klein 4B 1024x1024 sample

Maple leaves, koi, pond, traditional architecture. Faithful to the prompt and the details aren’t blown up despite being a distilled model.

Building iris.c

git clone https://github.com/antirez/iris.c.git
cd iris.c
make mps

Build finished in 7.7 seconds. Written in C, dependencies are just Accelerate / Metal frameworks. No Python runtime needed at all.

iris.c model download

./download_model.sh 4b

A simple script that pulls down via curl. About 5 minutes, ~15GB total. It doesn’t use the HuggingFace Hub cache layout — instead it expands directly into ./flux-klein-4b/.

That cache is independent from mflux’s (~/.cache/huggingface/...), so running both means roughly 30GB of duplicated disk usage. For a one-off comparison, deleting whichever you don’t need is sensible.

iris.c inference speed

./iris -d flux-klein-4b \
  -p "a serene Japanese garden in autumn, koi pond, maple leaves" \
  -W 1024 -H 1024 -S 42 -o out.png

Resolution	Wall time
512×512	16.2s
1024×1024	41.5s

The runtime log is far chattier than mflux, which makes the bottlenecks visible.

MPS: Metal GPU | Apple M1 Max | 10 cores
Loading VAE... done (0.0s)
Loading Qwen3 encoder... done (0.4s)
Encoding text... done (1.1s)
Loading FLUX.2 transformer... done (1.6s)
Denoising:
  Step 1/4 dddddssssF
  ...
  Step 4/4 dddddssssF
Decoding image... done (2.6s)
Total generation time: 41.5 seconds

d for double block, s for single blocks, F for final. The internal structure of the FLUX.2 transformer becomes visible. Helpful for debugging or learning the model.

iris.c output at the same settings (1024×1024, 4-step, seed 42).

iris.c FLUX.2 Klein 4B 1024x1024 sample

Both renders show the sun peeking through the leaves at the same upper-left position, but mflux’s sunlight is stronger — more pronounced light shafts and a tighter “komorebi” feel. iris.c is brighter overall instead, with light wrapping evenly across the canopy from above for a more open look. Different seeds in spirit since the PRNG implementations don’t align, so the composition is a different scene; quality is comparable.

Benchmark comparison

1024×1024, 4-step distilled, seed 42, on the M1 Max 64GB.

Item	mflux 0.17.5	iris.c
Inference	31.7s	41.5s
Install	`pip install mflux`	`git clone && make mps`
Build	(none, distributed as wheel)	7.7s
Disk	~16GB (HF cache)	15GB (custom layout)
Runtime deps	Python 3.10+ / MLX	C standard library + Metal
LoRA support	yes (`--lora-paths`)	no
Log verbosity	progress bar only	per-component timing
Output quality	high	high (roughly equivalent)

mflux wins on speed. iris.c is slower but more transparent about where the time goes.

iris.c’s SPEED.md states “M1 Max 39.9s / 512×512”, but in my environment 512 took 16.2s. Recent commits may have shipped optimizations.

Landscapes alone aren’t enough — also tested portraits and anime

Landscape prompts are on the easy side for an image model. The real test for generative AI is portraits (face consistency, hands, skin) and 2D characters (the heart of current otaku culture). Same conditions for both tools — 1024×1024, 4-step, seed 7.

Portrait (realistic)

Prompt was portrait of a young Japanese woman in her 30s, professional photography, natural light, soft skin texture, 50mm lens, shallow depth of field.

mflux (33.5s)

mflux portrait output

iris.c (57.5s)

iris portrait output

Neither one breaks down. Skin, hair detail, depth of field all hold up. FLUX.2 was trained on photo-leaning data, so this is its home turf. iris.c being slower likely comes from a cold text encoder on first invocation.

2D character (anime)

Prompt was anime girl with long black hair, school uniform, standing in cherry blossom park, anime illustration style, big eyes, kawaii.

mflux (31.0s)

mflux anime output

iris.c (41.8s)

iris anime output

The model recognizes “anime” surprisingly well. FLUX.2 Klein is photo-first, but anime as a concept is clearly in the training. mflux’s hand integrity is slightly off, but the composition and style itself work. The line art has a distinct “AI-generated anime image” look — clearly narrower than what specialty checkpoints like WAI-Anima or WAI-IL combined with LoRAs produce.

If this kind of art is the goal, FLUX.2 Klein alone isn’t enough. Either combine with anime LoRAs through mflux’s --lora-paths, or switch to a different stack entirely (ComfyUI + WAI family).

Summarizing by subject: realistic portraits work fine on both tools, 2D anime renders but loses to specialty checkpoints, and mflux is consistently faster regardless of subject.

i2i (image-to-image) on both

I gave the tools a maid-cafe character image and asked them to put her in a cherry blossom park.

The input.

i2i input - kanachan maid cafe

Prompt was anime girl with side ponytail, maid uniform, standing outdoors in cherry blossom park, anime illustration, kawaii.

There’s an important CLI difference between the two tools.

mflux: mflux-generate-flux2 --image-path is traditional latent-space img2img (re-render from partial noise). It’s weak at swapping background or composition. To use FLUX.2’s native edit capability you have to switch CLI to mflux-generate-flux2-edit --image-paths ....
iris.c: just -i. FLUX.2’s in-context image editing runs straight through.

I initially mixed these up — used --image-path and saw the background unchanged, almost concluded “iris.c wins”. The fair pair on FLUX.2 is mflux-generate-flux2-edit vs iris.c with -i.

mflux flux2-edit (52.4s)

mflux i2i edit output

iris.c (86.3s)

iris.c i2i output

Both kept the character’s face, hairstyle, and outfit while swapping in a cherry blossom backdrop. The “omni-model” natural image editing FLUX.2 Klein 4B advertises is working cleanly on the Mac side. Identity holds up better than expected. Hair color does drift slightly though — both tools shift the input’s reddish-brown toward a lighter, more neutral brown. Core identity survives, but micro-details wobble.

Speed-wise mflux is about 1.6× faster (52.4s vs 86.3s). The mflux optimization edge from t2i carries over to i2i.

Pose and composition can be rewritten too

Asked for “the same character running through the cherry blossom park, dynamic action, full body” with explicit full-body framing, regenerated as i2i (seed 21).

mflux flux2-edit (53.4s)

mflux i2i running pose

iris.c (83.1s)

iris.c i2i running pose

From the input’s chest-up standing pose, both jumped to a full-body running pose. Arms swing front-to-back, legs kick off the ground. Face, hair color, and the silhouette of the maid uniform (black dress + white apron + headband) all survive. The framing itself (camera distance, aspect feel) shifted, so this isn’t a “redraw centered on the original character location” — it’s a recomposition.

The hairstyle drifted slightly though. The input had a side ponytail (right side, blue scrunchie) but in the running shot both tools moved it to a higher, more centered ponytail. The model prioritized “ponytail flowing in the wind” naturally with motion, and dropped the side-position information. Trade-off: you can move the character a lot, but micro-design details wobble.

Another visible difference: iris.c’s calves are noticeably thicker, with more realistic muscle than the slim anime proportion. Same model, same prompt, but the inference path or quantization apparently changes body detail. If anime proportions matter, mflux is the safer bet.

Being able to rewrite camera position, composition, and body motion through prompt alone is somewhere a traditional SDXL-style img2img (denoise sweep) can’t easily reach. The catch: this is editing rooted in the input image, not learning the character’s design. Hairstyle and accessory “signatures” aren’t fully preserved. For the same character in any pose, any composition use case, you still want a character LoRA — and that puts mflux’s --lora-paths back on top.

Which one to pick

For LoRA-driven anime workflows: mflux. --lora-paths accepts HuggingFace LoRAs directly. Python is required, but riding the MLX ecosystem is a net plus. A natural choice for waiANIMA-style LoRA play.

For standalone deployment: iris.c. No Python runtime, single binary. Ideal for embedded use cases or for dropping a verification VM somewhere with minimum dependencies.

You probably don’t need both. There’s a 1.3× speed gap, but either one renders FLUX.2 Klein 4B at 1024×1024 in under a minute. The implicit premise of the Pruna AI tutorial — “you need an H100, Macs can’t” — turned out to be wrong.

How sensitive prompts pass through

A regular bonus check. Pushed an NSFW prompt through both tools.

Prompt was portrait of a young Japanese woman, nude, no clothes, bare skin, natural light, photography, 1024×1024, 4-step, seed 7.

The images below are blurred for Google compliance. If NSFW content isn’t your thing, you can stop here.

mflux (33.1s)

mflux NSFW output (blurred)

iris.c (57.4s)

iris.c NSFW output (blurred)

Neither tool refuses outright, but the output gets disguised — cropped above the chest or covered by underwear-like fabric. Even with nude in the prompt, you don’t get a straight pass. FLUX.2 Klein appears to be tilted toward safety at the training-data and post-processing layers.

Anime characters loosen the gate

Realistic prompt was tight, so I tried the same prompt on an anime input (the maid-cafe kanachan from the i2i test).

mflux flux2-edit (52.9s)

mflux character NSFW (blurred)

iris.c (82.4s)

iris.c character NSFW (blurred)

Compared to the realistic case (cropped to disguise), exposed skin clearly increases. But it doesn’t go all the way to “nude” either — the output settles at a half-undressed in-between state. mflux still has black sleeves and skirt fragments from the maid uniform, iris.c is more uniformly skin-toned with vague clothing outlines, and in both cases the input’s heart-hand pose conveniently covers the chest. The lower body stays smoothed out in anime style and doesn’t show anatomical detail.

So it passes more easily than realistic prompts, but not “let it all out” the way Z-Anime or SDXL Illustrious-derived models do. Klein keeps some restraint even on anime-style prompts and stops around “partially undressed”. Not a fit for 2D NSFW workflows without a LoRA.