Z-Image — Alibaba’s image generator that reportedly surpasses FLUX
On January 28, 2026, Alibaba’s Tongyi-MAI team released the foundation model for image generation, Z-Image. It is the base model behind Z-Image-Turbo, which had been released earlier in November 2025. The weights are available on Hugging Face under the Apache 2.0 license.
Comments like “it knocked FLUX off the throne” have been circulating in overseas communities, so I looked into it.
Core specs of Z-Image
- Architecture: Single-Stream Diffusion Transformer (S3-DiT)
- Parameter count: 6 billion (6B)
- Supported resolution: 512×512 to 2048×2048 (arbitrary aspect ratio)
- Inference steps: 28–50
- Guidance scale: 3.0–5.0
- Minimum VRAM: 6 GB with quantization (runs even on RTX 2060-class GPUs)
- License: Apache 2.0 (commercial use permitted)
Whereas FLUX adopts a Hybrid-Stream DiT (processing text and image separately before fusing), Z-Image uses a Single-Stream approach that processes text embeddings and the noisy image together from the start. This contributes to its strong parameter efficiency.
The Z-Image lineup
Z-Image comes in four models:
| Model | Use case |
|---|---|
| Z-Image | Base model. Suitable for fine-tuning and creating LoRA. |
| Z-Image-Turbo | Z-Image distilled + RLHF for speed. Can generate in 8 steps. |
| Z-Image-Omni-Base | Foundation model with multimodal support. |
| Z-Image-Edit | Instruction-based image editing model. |
Z-Image vs Z-Image-Turbo
They are in the same family but quite different in character.
| Z-Image | Z-Image-Turbo | |
|---|---|---|
| Inference steps | 28–50 | 8 |
| Negative prompts | Supported | Not supported |
| Diversity | High | Slightly lower |
| Fine-tuning suitability | High (LoRA, ControlNet) | Low |
| Image quality | High | Very high |
| Primary use | Custom model development, research | Fast image generation |
Turbo is a distilled model, so it’s fast, but customizability is sacrificed. If you want to bake in LoRA or use ControlNet, Z-Image is the clear choice.
Z-Image vs FLUX vs Stable Diffusion 3.5
Here’s the main comparison: open-source image generators as of January 2026.
| Z-Image | FLUX.1 | SD 3.5 | |
|---|---|---|---|
| Developer | Alibaba (Tongyi-MAI) | Black Forest Labs | Stability AI |
| Parameters | 6B | 12B | 8B |
| Architecture | Single-Stream DiT | Hybrid-Stream DiT | MM-DiT |
| Minimum VRAM | 6 GB (quantized) | 24 GB+ | 12 GB+ |
| License | Apache 2.0 | Dev: noncommercial / Pro: commercial | Stability AI Community |
| CFG | Fully supported | Dev: not supported | Supported |
| Negative prompts | Supported | Dev: not supported / Pro: supported | Supported |
| Elo ranking (AI Arena) | #1 among open source | Lower | Not listed |
Dramatically lighter VRAM requirements
This is Z-Image’s biggest strength. FLUX needs 24 GB or more for the full model—and around 12 GB even when quantized—so it’s tough on an RTX 3060 or 4060. Z-Image can be quantized down to about 6 GB, enabling image generation in roughly 30 seconds even on RTX 2060-class GPUs.
Benchmarks
In Alibaba’s AI Arena Elo-based evaluation, Z-Image-Turbo ranks first among open-source models. It outperforms GPT Image 1 (OpenAI), FLUX.1 Kontext Pro, and Ideogram 3.0, placing fourth globally behind Google Imagen 4 and ByteDance Seedream.
The base Z-Image is also claimed by the authors to offer “performance comparable to models 10× larger,” and it does appear outstanding in parameter efficiency.
Ecosystem still immature
There are weaknesses too. Compared with Stable Diffusion and FLUX, there are far fewer third-party tools, community models, and tutorials. That said, there have been reports that the pace of LoRA creation has surpassed FLUX since release, so this gap may close quickly.
Getting started
diffusers (Python)
import torch
from diffusers import ZImagePipeline
pipe = ZImagePipeline.from_pretrained(
"Tongyi-MAI/Z-Image",
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
image = pipe(
prompt="a cat sitting on a windowsill at sunset",
height=1024,
width=1024,
num_inference_steps=50,
guidance_scale=4,
generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("output.png")
ComfyUI
ComfyUI has native support from day one. You can install it via ComfyUI Manager.