Testing Z-Image i2i for Pixel Art Conversion

In Testing Pixel Art Conversion with Qwen Image Edit, I compared 5 patterns for photo-to-pixel-art conversion. The winner was Illustrious i2i + pixel-art-xl LoRA (Pattern E) at 1:30 processing time and 6.5GB memory.

But there was one option I hadn’t tried: Z-Image i2i.

I’ve written several articles about Z-Image on this blog already. Comparison with FLUX and SD, RunPod feasibility, derivative models, Z-Image-Distilled. But I never considered it for pixel art conversion. When I found out multiple pixel art LoRAs exist for Z-Image, I decided to investigate.

pixel-art-xl LoRA Doesn’t Work with Z-Image

The pixel-art-xl v1.1 used previously is an SDXL LoRA. Illustrious is SDXL-based, so it works directly.

Z-Image uses S3-DiT (Single-Stream Diffusion Transformer), a completely different architecture. It processes text and image as a single sequence from the start, unlike SDXL’s U-Net with its downsample/upsample structure. Since LoRA inserts low-rank matrices into specific model layers (attention weight matrices, etc.), different architectures mean no compatibility.

graph LR
    subgraph SDXL
        A[pixel-art-xl LoRA] --> B[Illustrious<br/>WAI-SDXL]
        A --> C[NoobAI-XL]
    end
    subgraph Z-Image
        D[Z-Image<br/>pixel art LoRA] --> E[Z-Image Turbo]
        D --> F[Z-Image Base]
    end
    A -..->|Incompatible| E

To do pixel art conversion with Z-Image, you need Z-Image-specific pixel art LoRAs.

Pixel Art LoRAs for Z-Image

Found several.

LoRA	Platform	Features
Pixel Art Style LoRA	HuggingFace	SNES-style retro pixel art. Strength 0.6 for detailed, 1.0 for classic
PixelArt Perfect	Civitai	Pixel shader-like conversion. Trigger word `pixelart`
Elusarca’s Detailed Pixel Art	Civitai	Higher detail
ZIT - PixelSIN	Civitai	Z-Image Turbo specific

All targeting Z-Image Turbo. Surprisingly, there are already more pixel art LoRAs for Z-Image than SDXL (which only had pixel-art-xl). The Z-Image ecosystem is growing fast.

The Fused QKV Attention Trap

Z-Image Turbo fuses its QKV attention mechanism for faster inference. Standard LoRAs handle Q/K/V projections separately, but fused QKV combines them into a single matrix. ComfyUI’s standard LoRA loader can’t handle this.

Comfyui-ZiT-Lora-loader is a custom node that auto-converts LoRAs for fused QKV. Required for any LoRA on Z-Image Turbo.

Z-Image base (non-Turbo) doesn’t use fused QKV, so the standard loader works. But it needs 28-50 steps, which is too slow for a pixel art conversion pipeline.

Amazing Z-Image Workflow

The Amazing Z-Image Workflow v4.0 ComfyUI workflow includes style presets.

Workflow	Content
amazing-z-image-a	General styles (18 types)
amazing-z-image-b	Unique styles
amazing-z-comics	Comics, anime, pixel art
amazing-z-photo	Photography

amazing-z-comics has a pixel art preset. Convenient if it works without LoRA, but unlikely to apply style as strongly. Primarily a txt2i workflow, so i2i conversion quality is unknown.

Z-Image-Edit: The Editing-Specialized Model

Z-Image series includes Z-Image-Edit, specialized for image editing.

Model	Approach	Use Case
Z-Image Turbo	i2i + LoRA	Style transfer via pixel art LoRA
Z-Image-Edit	Instruction-based editing	”Convert to pixel art” via prompt

Same concept as Qwen-Image-Edit: give editing instructions via prompt on an input image. The same approach as Pattern A in Testing Pixel Art Conversion with Qwen Image Edit.

Supports img2img and inpainting. According to the official docs, it can process facial expression changes, background replacement, and caption addition simultaneously in a single prompt.

Weights Not Released (as of April 2026)

However, Z-Image-Edit weights are not yet publicly available. GitHub model zoo still shows “To be released”. Nothing on HuggingFace or ModelScope.

Model	Status
Z-Image (base)	Released (Jan 2026)
Z-Image-Turbo	Released (Jan 2026)
Z-Image-Omni-Base	Unreleased
Z-Image-Edit	Unreleased

Web demos (z-image-edit.com etc.) offer API access, but no local execution. Since a pixel art conversion pipeline requires local execution, Z-Image-Edit isn’t an option right now.

When it’s released, I want to compare it against the Qwen test under the same conditions. Qwen only managed “fake pixel art” at best.

Pipeline Comparison (Pre-Test Estimates)

graph TD
    A[Source Image]
    A --> B1[Illustrious i2i<br/>+ pixel-art-xl LoRA]
    A --> B2[Z-Image Turbo i2i<br/>+ pixel art LoRA]
    A --> B3["Z-Image-Edit (unreleased)<br/>Prompt instruction"]
    B1 --> C1[Pixel Art<br/>1:30 / 6.5GB]
    B2 --> C2[Pixel Art<br/>Unknown / 20GB]
    B3 --> C3[Pixel Art<br/>Untested]

Illustrious i2i (Current)

Item	Value
Architecture	U-Net (SDXL)
Parameters	~6.6B
Memory	~6.5GB
Inference Steps	25
LoRA Loading	Standard node
Processing Time (M1 Max)	1:30

Z-Image Turbo i2i (Candidate)

Item	Value
Architecture	S3-DiT
Parameters	6B
Memory (BF16)	~20GB (model 12GB + encoder 7GB + VAE)
Memory (Quantized)	~6GB (Q4_K_M)
Inference Steps	8
LoRA Loading	Custom loader required
Processing Time (M1 Max)	Unknown

Z-Image-Edit (Unreleased)

Item	Value
Architecture	S3-DiT
Parameters	6B
Memory	Unknown
LoRA	Not needed (prompt instruction)
Status	Unreleased (To be released)

Z-Image Turbo’s 8 steps looks great on paper. But per-step compute for S3-DiT may differ from U-Net, and the Qwen 3 4B text encoder (7GB) adds overhead on first encode.

Memory is 3x heavier at BF16. Quantization drops it to 6GB, but quantization might soften pixel edges. Pixel art lives and dies by sharp pixel boundaries, so this needs real output to judge.

Setup

Environment: M1 Max 64GB + ComfyUI (Metal-enabled). Z-Image Turbo running on ComfyUI + Metal was already confirmed in the BEYOND_REALITY_Z_IMAGE article. ~20GB BF16, well within 64GB.

Model Files

Download single-file ComfyUI versions from Comfy-Org/z_image_turbo. The official repo (Tongyi-MAI/Z-Image-Turbo) uses diffusers format with sharded files, so use the Comfy-Org version for ComfyUI.

ComfyUI/models/
├── diffusion_models/
│   └── z_image_turbo_bf16.safetensors
├── text_encoders/
│   └── qwen_3_4b.safetensors
└── vae/
    └── ae.safetensors

LoRA

Download Pixel Art Style LoRA to ComfyUI/models/loras/. This is the default LoRA in ComfyUI’s official Z-Image workflow template.

Custom Node

Install Comfyui-ZiT-Lora-loader. Required for fused QKV support on Z-Image Turbo.

Test 1: txt2i (Basic Generation)

First, verify Z-Image Turbo can generate images at all. Using the official “Text to Image (Z-Image-Turbo)” ComfyUI template.

Settings

Item	Value
Steps	8
CFG	1.0
Sampler	euler
Scheduler	simple
Resolution	1024x1024

Note: —fp16-vae Breaks Z-Image

Starting ComfyUI with --fp16-vae causes Z-Image’s VAE (ae.safetensors) to output solid black. Z-Image’s VAE requires BF16. Launch ComfyUI without --fp16-vae.

Result

Z-Image Turbo txt2i

Works fine. Blue-haired character generated as prompted.

Processing time: 87s (including initial model load)
~20GB BF16 memory usage

First run is slow due to model loading. Subsequent runs are ~75s with cached models.

Test 2: i2i (Image Input)

Next, test img2img. Using the same character illustration from Testing Pixel Art Conversion with Qwen Image Edit.

Input image

Settings

Item	Value
Input	kana-il-source.webp
Denoise	0.6
Prompt	1girl, full body, pixel art style
Other	Same as Test 1

Denoise 0.6 matches the Illustrious i2i test in Testing Pixel Art Conversion with Qwen Image Edit. Strong enough to change style while preserving composition.

Result

Input

Z-Image Turbo i2i
i2i output

i2i works. Preserves the original composition and character feel while applying Z-Image Turbo’s style.

Processing time: 75s

Test 3: i2i + Pixel Art LoRA

The main event. Can a pixel art LoRA on i2i produce actual pixel art?

Settings

Item	Value
Input	kana-il-source.webp
LoRA	pixel_art_style_z_image_turbo
LoRA Strength	1.0
Trigger Word	Pixel art style
LoRA Loader	ZImageTurboLoraLoader (auto_convert_qkv enabled)

Must use Comfyui-ZiT-Lora-loader’s ZImageTurboLoraLoader instead of the standard LoraLoaderModelOnly. The auto_convert_qkv option handles the fused QKV conversion automatically.

The standard LoRA loader has zero effect. Tested it, and the output was identical to i2i without LoRA. The “fused QKV attention trap” described earlier in this article, proven in practice.

Denoise Comparison

denoise 0.8

denoise 1.0

At denoise 0.8, no pixel art. The source image dominates and the LoRA style gets crushed. At denoise 1.0, pixel art finally appears, but the source image is essentially gone.

Z-Image Turbo only runs 8 steps. At low denoise, there’s not enough “room” for the LoRA to apply style transformation. Illustrious has 25 steps, so LoRA works even at denoise 0.6. Fewer steps is great for speed but works against i2i style transfer.

txt2i + LoRA Works Fine

For reference, txt2i (no input image) with the LoRA produces proper pixel art.

txt2i + pixel art LoRA

The LoRA itself works correctly. It’s specifically i2i on Turbo where it fails.

Test 4: Z-Image Base Model i2i + LoRA

If Turbo’s 8 steps aren’t enough, maybe the base model (28-50 steps) would work. The base model doesn’t use fused QKV, so the standard LoRA loader is fine.

Settings

Item	Value
Model	z_image_bf16.safetensors (base model)
Steps	30
CFG	4.0
Denoise	0.6
LoRA	pixel_art_style_z_image_turbo (strength 1.0)
LoRA Loader	Standard LoraLoaderModelOnly

Downloaded from Comfy-Org/z_image. Text encoder and VAE shared with Turbo.

Result

Input

Z-Image Base i2i + LoRA
Base model pixel art

Z-Image Turbo i2i + LoRA
Turbo pixel art

The LoRA has some effect, but this isn’t pixel art. It’s just a grid-like noise pattern over the whole image. Pixel edges aren’t sharp, colors aren’t reduced. Just degradation. The LoRA was built for Turbo and doesn’t transfer well to the base model.

Processing time was 520 seconds (8 min 40 sec). This quality after that wait is unacceptable.

Comparison

	Illustrious i2i	Z-Image Turbo i2i	Z-Image Base i2i
Processing Time	1:30	75s	520s
Memory	6.5GB	~20GB	~20GB
Inference Steps	25	8	30
LoRA at denoise 0.6	Works	No effect	Responds but only degradation
LoRA Loader	Standard node	Custom loader required	Standard node

Every route failed.

Route	Result
Turbo i2i + LoRA	LoRA has no effect at low denoise. Works at denoise 1.0 but source image is lost
Base model i2i + LoRA	LoRA responds but output is degradation, not pixel art. 520s for this
Turbo txt2i + LoRA	Produces pixel art, but isn’t i2i (image conversion), so different use case

Z-Image i2i pixel art conversion isn’t practical right now. The conclusion from Testing Pixel Art Conversion with Qwen Image Edit stands: Illustrious i2i + pixel-art-xl LoRA (denoise 0.6, 1:30, 6.5GB) remains the best option.