Z-Image-Distilled - a Z-Image derivative that keeps diversity while speeding up inference
Contents
what Z-Image-Distilled is
It is a derivative model based on Z-Image that speeds up inference through distillation.
- official page: GuangyuanSD/Z-Image-Distilled - Hugging Face
- Civitai: Z-Image-Distilled
This is a “pure” distilled model. Unlike Z-Image-Turbo, it does not include Turbo weights or style.
basic specs
| item | Z-Image (original) | Z-Image-Distilled |
|---|---|---|
| recommended steps | 28-50 | 10-20 |
| CFG | 3.0-7.0 | 1.0-2.5 |
| diversity | high | medium, but better than Turbo |
| LoRA compatibility | high | high |
| license | Apache-2.0 | Apache-2.0 |
Because good results show up in 10 to 20 steps, it can generate in less than half the time of the original.
recommended settings
CFG: 1.0-1.8 (higher values improve prompt adherence)
Steps: 10 (preview), 15-20 (stable quality)
Sampler: Euler, simple, res_m
LoRA weight: 0.6-1.0
distilled model comparison: Schnell vs Distilled
I wrote in the FLUX.2 Klein article that distillation often reduces diversity. FLUX.1 Schnell is the classic example.
| model | approach | diversity | speed |
|---|---|---|---|
| FLUX.2 Klein | parameter reduction without distillation | high | somewhat slow |
| FLUX.1 Schnell | distilled for speed | low | fast |
| Z-Image-Distilled | distilled for speed | medium | fast |
The claim is that Z-Image-Distilled keeps diversity even after distillation. In practice, its good LoRA training compatibility suggests it still has plenty of flexibility as a base model.
It is a little slower than Turbo, but if you care about diversity and LoRA compatibility, it is a solid option.
can it run on an M1 Max 64GB machine?
Short answer: yes. Easily.
requirements
- Z-Image Turbo (bf16): 12-16GB VRAM
- Z-Image-Distilled is assumed to be similar
on an M1 Max 64GB
| item | status |
|---|---|
| unified memory | 64GB |
| GPU available | about 48GB, with a 75% limit |
| model requirement | 12-16GB |
| headroom | plenty |
It is much lighter than FLUX.2 Klein, which needs 29GB.
if you still need to reduce memory
If memory is still tight:
- GGUF quantization can run on 6GB VRAM
stable-diffusion.cppcan run on 4GB VRAM in a pure C++ implementation
On an M1 Max 64GB machine, the full model should be fine without quantization.
known limitations
weaker text rendering
Distillation hurts text rendering quality, especially for small text. It is not a good fit if you want to generate logos or signs.
color cast
Some samplers can make the output skew bluish. Changing the sampler or adjusting the prompt helps.
using it in ComfyUI
It is ComfyUI-compatible. The layer prefix is model.diffusion_model.
It supports both Chinese and English prompts.