Can Z-Image run on RunPod? I checked it for character consistency
Contents
Why I looked into this
I had just run Qwen-Image-Edit on RunPod, and then on January 28, 2026, the full version of Z-Image from Alibaba’s same Tongyi lab was released.
What stood out in the samples was how stable the character shapes looked. When I generate illustrations with Qwen-Image-Edit, faces and body shapes sometimes drift. Z-Image is said to support negative prompts, a wider CFG range, and strong LoRA compatibility, so I wondered if it could produce stable manga-style art while keeping the character consistent.
So I checked whether it would run as-is in my current RunPod + ComfyUI setup.
What Z-Image is
Z-Image is a 6B parameter image-generation foundation model developed by Alibaba Tongyi-MAI. It is released under Apache 2.0, so it is commercially usable.
Model lineup
| Model | Purpose | Notes |
|---|---|---|
| Z-Image | Base model | High quality, high diversity, negative prompts, LoRA support |
| Z-Image-Turbo | Fast generation | 8 steps, sub-second inference |
| Z-Image-Omni-Base | Unified base | Generation + editing, fine-tuning target |
| Z-Image-Edit | Image editing | Editing-focused |
Architecture
- S3-DiT (Scalable Single-Stream Diffusion Transformer)
- Flow Matching-based
- Text encoder: Qwen 3 4B
- Compared with FLUX’s 32B parameters, Z-Image achieves similar quality with 6B
- Ranked #1 among open-source models on the Artificial Analysis leaderboard
Comparison with Qwen-Image-Edit
Here are the differences that matter for character consistency:
| Z-Image | Qwen-Image-Edit | |
|---|---|---|
| Parameters | 6B | 7B (Qwen2.5-VL-7B) |
| Negative prompts | Supported | Supported |
| CFG | 3.0-5.0 | 1.0 |
| LoRA support | Officially supported | Possible, but the setup is more complex |
| ControlNet | Supported (Union version available) | Limited |
| Use case | txt2img / img2img | img2img / image editing |
| Style control | Switch with cfg_normalization | Prompt-driven |
CFG and negative prompt differences
Qwen-Image-Edit is fixed at CFG 1.0, so negative prompts are weaker. Z-Image lets you tune CFG between 3.0 and 5.0 and negative prompts work properly. Whether prompts like bad anatomy, deformed actually bite matters a lot for character drift, so this is a big deal.
cfg_normalization
Z-Image has a model-specific setting: False gives a more stylish, illustration/manga-like output, while True trends more realistic. If you want manga-style art, False is the way to go.
LoRA support for character training
Z-Image is said to have strong LoRA compatibility. If you train your own character as a LoRA, you can keep the face and body stable across many poses and compositions. Qwen-Image-Edit can do it too, but the setup is more complex and Z-Image’s ecosystem looks better organized.
Can it run on RunPod + ComfyUI?
Short answer: yes, with the same workflow as the Qwen NSFW setup. In fact, Z-Image is simpler.
Hardware requirements
| GPU | Z-Image | Qwen NSFW (Phr00t AIO) |
|---|---|---|
| RTX 4090 (24GB) | Works (about 12GB in bf16) | Does not work (needs 28GB) |
| RTX 5090 (32GB) | Plenty of room | Barely works |
Because Z-Image is only 6B parameters, it runs comfortably on an RTX 4090. I had to move Qwen NSFW over to a 5090, but Z-Image should be fine on a 4090. That also means the cost stays around $0.59/hour.
ComfyUI support
ComfyUI supports it natively from day one. Unlike the Qwen NSFW setup, there is no need to install custom nodes like TextEncodeQwenImageEditPlus or swap in Phr00t’s modified nodes_qwen.py.
Model file layout
ComfyUI/models/
├── text_encoders/
│ └── qwen_3_4b.safetensors
├── diffusion_models/
│ └── z_image_bf16.safetensors
└── vae/
└── ae.safetensors
Qwen NSFW’s Phr00t AIO was a single 28GB file, while Z-Image is split into separate files. Still, it is clean and simple with only three files.
Expected setup flow
The flow is basically the same as the Qwen NSFW setup.
1. Create a Pod
- GPU: RTX 4090 is enough
- Template: runpod/comfyui:latest is fine
Unlike the Qwen NSFW version, which needed a 5090+ specific template, this runs on the standard setup.
2. Download the models
pip install huggingface_hub
cd /workspace/ComfyUI/models
# Diffusion model
python3 -c "
from huggingface_hub import hf_hub_download
hf_hub_download('Tongyi-MAI/Z-Image', 'z_image_bf16.safetensors', local_dir='./diffusion_models/')
"
# Text encoder (Qwen 3 4B)
python3 -c "
from huggingface_hub import hf_hub_download
hf_hub_download('Tongyi-MAI/Z-Image', 'qwen_3_4b.safetensors', local_dir='./text_encoders/')
"
# VAE
python3 -c "
from huggingface_hub import hf_hub_download
hf_hub_download('Tongyi-MAI/Z-Image', 'ae.safetensors', local_dir='./vae/')
"
3. Configure ComfyUI
- Sampler:
euler/dpmpp_2m - Scheduler:
AuraFlow - Steps: 28 to 50
- CFG: 3.0 to 5.0
- Resolution: 1024x1024 recommended, but 512x512 to 2048x2048 is supported
No custom nodes are needed. The standard ComfyUI nodes are enough.
How the img2img approach differs
Both can do img2img, but the mechanism is different.
- Z-Image: a normal diffusion img2img pipeline. It adds noise to the input image and denoises it.
denoise strengthcontrols how much of the original image is kept - Qwen-Image-Edit: an image-editing model. It recognizes the character in the input image and edits it according to the prompt
If you want stable manga-style art while keeping the character consistent, the Z-Image img2img + LoRA combination looks more stable. Once the character is trained as a LoRA, both txt2img and img2img should drift less.
Qwen-Image-Edit’s editing approach is intuitive for “change this image into X”, but if you want many consistent variations, it tends to drift more.
There is also an image-editing model in the Z-Image family, Z-Image-Edit, so that is another option when you specifically want editing.
ControlNet can lock the composition
Z-Image-Turbo-Fun-Controlnet-Union is already out. With ControlNet, you can specify a pose and still keep the character stable, which is strong if you want the same character in many poses.
Related articles
Other image-generation articles on this blog:
- Ran Qwen-Image-Edit’s NSFW version on RunPod
- Run Qwen-Image-Edit-2511 on RunPod
- Specs needed to run Qwen-Image-Edit-2511 locally
- Checked how to run the LoRA for Qwen-Image-Layered on RunPod
- Building a LoRA training environment on an RTX 3060 Laptop (6GB VRAM)
- SeaArt LoRA creation, practical edition