Boogu-Image-0.1 on M1 Max ComfyUI: fp8 fails on MPS, bf16 works at ~70s/image

I ran Boogu-Image-0.1 on an M1 Max 64GB Mac through ComfyUI. The short version: the fp8 build Comfy-Org ships is rejected by MPS, but the bf16 build runs fine, around 70 seconds per 1024px image with Turbo.

Boogu-Image-0.1 is a 10B, Apache-2.0 image generation and editing model that landed on June 16, 2026, and ComfyUI core picked it up natively on June 18 (commit Support Boogu-Image (CORE-308)), so it was easy to try fresh.

The catch is that loading Boogu means updating ComfyUI core, and bumping ComfyUI on a Mac has burned me before. The official repo assumes CUDA and says nothing about Macs. The only machine I have to test on is an M1 Max, so that’s what I used.

The image at the top is a cafe interior from that bf16 Turbo run, with latte art, a potted plant, and a fireplace all holding together at 4 steps.

What Boogu-Image-0.1 is

The developer is listed as “Boogu / boogu-project”; there’s no named parent org, and the technical report is still “coming soon on arXiv.” The pitch is near-closed-source quality with roughly one-tenth the training data, scoring 53.58 on Qwen-Image-Bench and beating the 20B Qwen-Image-2512 (52.06) at 10B.

The architecture closely mirrors Qwen-Image. Reading the ComfyUI implementation (comfy/ldm/boogu/model.py), you find BooguDoubleStreamBlock and BooguJointAttention: a double-stream MMDiT that processes text and image in the same blocks. It uses the full Qwen3-VL-8B as the text encoder and reuses the FLUX.1 VAE. In supported_models.py it inherits from Omnigen2 and bakes a flow-matching shift of 3.16 into the model config.

Component	Detail
Backbone	10B double-stream DiT (MMDiT family)
Text encoder	Qwen3-VL-8B (last hidden state, no-think template)
VAE	FLUX.1 VAE (reused)
Variants	Base / Edit / Turbo (each in bf16, fp8, nvfp4)
Steps	Base/Edit 25-50, Turbo 3-4 (distilled)

Comfy-Org ships the diffusion backbone in three formats: bf16 (20.59GB), fp8 scaled (10.31GB), and nvfp4 (5.83GB), plus the text encoder qwen3vl_8b_fp8_scaled (10.59GB) and the VAE flux1_vae_bf16 (0.17GB). There’s also a distilled Turbo LoRA (1.35GB) you can apply to Base. nvfp4 targets NVIDIA Blackwell, so it’s off the table on a Mac.

The other selling point is bilingual (Chinese/English) text rendering: posters and signs with legible text. I test that later.

Test setup

Item	Detail
Machine	Mac Studio M1 Max 64GB
OS	macOS 26.5
Python	3.12.12
PyTorch	2.10.0 (MPS)
ComfyUI	master tip f2270f07 (as of June 18)
Backbone	Boogu-Image-0.1-Turbo bf16 (19GB)
TE / VAE	Qwen3-VL-8B fp8 (10GB) / FLUX.1 VAE bf16
Common	Turbo 4 steps / cfg 1.0 / euler / simple / 1024px

Updating ComfyUI can make things slower

Loading Boogu means bumping ComfyUI core, and on a Mac that update can be a landmine. An earlier update once pushed Qwen Image Edit from 80 seconds to nearly 10 minutes (I traced the cause and fix here).

On a Mac, MPS runs BF16 slower than FP16, and M1 through M3 have no BF16 hardware support. If a core update flips a model’s inference dtype to BF16, generation alone gets several times heavier. So instead of touching my working setup, I spun up a separate ComfyUI to check speed first, then ran Boogu. This master update didn’t slow down the Qwen-family workflows.

fp8 is rejected by MPS

The native node graph is simple: UNETLoader for the backbone, CLIPLoader with type boogu for Qwen3-VL-8B, and VAELoader for the FLUX VAE. For t2i you use the standard CLIPTextEncode; only Edit needs the dedicated TextEncodeBooguEdit.

I started with the lightest fp8 Turbo for a smoke test, and it died at sampling.

TypeError: Trying to convert Float8_e4m3fn to the MPS backend
but it does not have support for that dtype.

MPS can’t even hold a float8 tensor. fp8 scaled is quantization built around CUDA’s fp8 math, and it doesn’t land on Apple Silicon. This is the same root cause as the March Qwen case, where fp8 models also wouldn’t run on MPS without a patch converting weights to BF16.

What’s interesting is that only the backbone choked. The error fired in KSampler, while the CLIPTextEncode just before it went through, so the Qwen3-VL-8B text encoder runs fine in fp8 and only the diffusion model’s fp8 math stops. The only path left on a Mac is the bf16 backbone.

bf16 works

Loading the bf16 build (19GB on disk) into UNETLoader goes through this time as model weight dtype torch.bfloat16 and runs to completion. The fp8 wall is gone.

Speed:

Condition	Time
cold (first run, ~50s load included)	118.8s
warm (4 steps / 1024px)	~70s (~17s/step)

Memory: the 21GB bf16 backbone plus the 10GB text encoder nearly fill 64GB. The compressor ran flat out and free memory dropped to tens of MB, but it didn’t OOM.

70s/image is heavy for a 10B model, but reasonable in this environment. Qwen Image Edit AIO takes 5-6 minutes, and IL- or Anima-family models also hit 1-2 minutes at higher resolution. Drop the steps and it’s under a minute. Honestly, a 10B running at this speed on a Mac is better than I expected.

Generating people

The LoRA doesn’t apply here, so I just described my character’s look in plain text (side ponytail with a blue scrunchie, white shirt, red tie) and checked how well a person comes out. The face and fine details don’t need to match since the model is different; I’m checking whether the broad strokes land and what the quality looks like.

Photoreal

Boogu is strong at photoreal, but it’s all about the prompt. A lazy portrait photo of a person gives rough skin, and without a specified ethnicity you get a realistic middle-aged Western woman. That plainness is the point: it doesn’t auto-apply beauty correction like other image models. Not idealizing is its default tendency.

Add quality terms (professional portrait, smooth clear skin, 85mm f1.4, soft window light) and it jumps to clean, pro-shoot skin. The nice Boogu people you see elsewhere are mostly this side of the prompt.

Lazy prompt, photoreal. Rough skin, and with no ethnicity specified the face skews Western

Quality-term prompt, photoreal. Same Boogu Turbo bf16, now pro-shoot clean skin

Without a specified ethnicity the default is Western; add Japanese and it shifts to East Asian features. That said, the polished East Asian face leans K-beauty, more Korean idol than Japanese. The training data may carry a lot of K-pop imagery.

Anime

Just anime illustration comes out watercolor-ish, not crisp anime shading. Add masterpiece, cel shading, vibrant flat colors, and clean lineart, and it sharpens into anime coloring. It’s not as committed as the dedicated Anima models, but it’s solid anime art on its own.

Anime with quality terms added. The watercolor look is gone, closer to cel shading

It looks rough on a lazy prompt and gets written off as “unusable,” but with a worked prompt both photoreal and anime are well within usable range.

Text rendering

I tested the bilingual text selling point. The bottom line: only a single large headline word is reliable. COFFEE in English and おはよう in Japanese come out legible as poster titles. But drop to small body text like menu items and it collapses into nonsense strings like CPASCPA. Even the price digits are shaky; the top 0 is unreadable as a zero.

English poster. The COFFEE headline is legible, but the body items collapse into nonsense

Japanese poster. The おはよう headline is legible

The official notes even list “text unstable with dense typography” as a weakness, and that holds on a Mac with 4-step Turbo bf16. A single hero word works; reading body text doesn’t.

Where NSFW stops

A share of readers who search by model name come to check this, so I’ll be direct: Boogu won’t produce usable adult images.

Swimwear is fine. An adult woman in a swimsuit on a beach comes out SFW with no refusal or blur. But thanks to the non-idealizing default, you again get a realistic middle-aged woman.

Ask for nude and you do get a topless result, but the anatomy collapses into a melted oil-painting and never resolves into a real image. Push harder with explicit or fully naked and it actually sanitizes back to a clothed swimsuit. The harder you push, the more it cleans up.

Result from an explicit prompt. The more explicit the request, the more it sanitizes to swimwear

It’s less a hard filter than the sense that NSFW simply isn’t in the training data, so the model isn’t suited to it. It reads as a general photoreal model with safe-leaning alignment.

References

boogu-project/Boogu-Image — official repo
Comfy-Org/Boogu-Image — ComfyUI-repackaged weights
boogu-project/ComfyUI-Boogu — official custom node
Why a ComfyUI update made Qwen Image Edit take 10 minutes on MPS — the Mac MPS and dtype landmine