Z-Image-Distilled - a Z-Image derivative that keeps diversity while speeding up inference

what Z-Image-Distilled is

It is a derivative model based on Z-Image that speeds up inference through distillation.

official page: GuangyuanSD/Z-Image-Distilled - Hugging Face
Civitai: Z-Image-Distilled

This is a “pure” distilled model. Unlike Z-Image-Turbo, it does not include Turbo weights or style.

basic specs

item	Z-Image (original)	Z-Image-Distilled
recommended steps	28-50	10-20
CFG	3.0-7.0	1.0-2.5
diversity	high	medium, but better than Turbo
LoRA compatibility	high	high
license	Apache-2.0	Apache-2.0

Because good results show up in 10 to 20 steps, it can generate in less than half the time of the original.

recommended settings

CFG: 1.0-1.8 (higher values improve prompt adherence)
Steps: 10 (preview), 15-20 (stable quality)
Sampler: Euler, simple, res_m
LoRA weight: 0.6-1.0

distilled model comparison: Schnell vs Distilled

I wrote in the FLUX.2 Klein article that distillation often reduces diversity. FLUX.1 Schnell is the classic example.

model	approach	diversity	speed
FLUX.2 Klein	parameter reduction without distillation	high	somewhat slow
FLUX.1 Schnell	distilled for speed	low	fast
Z-Image-Distilled	distilled for speed	medium	fast

The claim is that Z-Image-Distilled keeps diversity even after distillation. In practice, its good LoRA training compatibility suggests it still has plenty of flexibility as a base model.

It is a little slower than Turbo, but if you care about diversity and LoRA compatibility, it is a solid option.

can it run on an M1 Max 64GB machine?

Short answer: yes. Easily.

requirements

Z-Image Turbo (bf16): 12-16GB VRAM
Z-Image-Distilled is assumed to be similar

on an M1 Max 64GB

item	status
unified memory	64GB
GPU available	about 48GB, with a 75% limit
model requirement	12-16GB
headroom	plenty

It is much lighter than FLUX.2 Klein, which needs 29GB.

if you still need to reduce memory

If memory is still tight:

GGUF quantization can run on 6GB VRAM
stable-diffusion.cpp can run on 4GB VRAM in a pure C++ implementation

On an M1 Max 64GB machine, the full model should be fine without quantization.

known limitations

weaker text rendering

Distillation hurts text rendering quality, especially for small text. It is not a good fit if you want to generate logos or signs.

color cast

Some samplers can make the output skew bluish. Changing the sampler or adjusting the prompt helps.

using it in ComfyUI

It is ComfyUI-compatible. The layer prefix is model.diffusion_model.

It supports both Chinese and English prompts.