RunPod setup for Qwen-Image-Layered LoRA

I wanted to automatically separate facial parts for Live2D, so I looked into a setup for running a LoRA for Qwen-Image-Layered on RunPod. It’s a different model from Qwen-Image-Edit; this one is specialized for “layer separation.”

What is Qwen-Image-Layered?

It’s an image generation model released by Alibaba, notable for being able to directly generate transparent layers. Regular image generation produces a single flat image, but this model outputs layers separated by background, foreground, and parts.

Here we use tori29umai’s LoRA. It’s trained to separate facial parts (eyes, nose, mouth, etc.) into distinct layers. There are reports that Live2D part separation went from 45 minutes to under 2 minutes.

References:

Why RunPod?

Qwen-Image-Layered consumes a lot of VRAM.

Model format	Size	Required VRAM
BF16 (highest quality)	~40GB	48GB+
FP8 (lightweight)	~20GB	24GB+

A local RTX 4090 (24GB) can’t run the highest-quality variant. Google Colab is also capped at 24GB, which is tight. With RunPod you can rent GPUs with 48GB+ VRAM by the hour.

GPU choice: RTX 6000 Ada vs RTX PRO 6000

Comparison of 48GB-and-up GPUs available on RunPod.

GPU	VRAM	Architecture	Price guide	Highlights
RTX 6000 Ada	48GB	Ada Lovelace	$0.8–1.2/hour	Minimum to run BF16 build; good value
RTX PRO 6000	96GB	Blackwell	$1.5–2.0/hour	Latest generation; fastest; ample headroom

※ Prices are approximate for Community Cloud as of January 2026.

Which should you choose?

Recommendation: RTX 6000 Ada (48GB)

Easily fits the Qwen-Image-Layered BF16 (~40GB) plus the LoRA
Cheaper
Speed is more than practical

Pick RTX PRO 6000 (96GB) when

You want to finish a lot in a short time
Speed is the top priority (Blackwell features higher compute)
You want complex workflows without worrying about VRAM

Required files

Base model (Qwen-Image-Layered)

Use the files split for ComfyUI.

File	Path	Size
qwen_image_layered_bf16.safetensors	models/diffusion_models/	~40GB
qwen_image_layered_vae.safetensors	models/vae/	-
qwen_2.5_vl_7b_fp8_scaled.safetensors	models/text_encoders/	-

Download source: Comfy-Org/Qwen-Image-Layered_ComfyUI

LoRA (for facial part separation)

File	Path
QIL_face_parts_V3_dim16_1e-3-000056.safetensors	models/loras/

Download source: tori29umai/Qwen-Image-Layered

Setup steps on RunPod

1. Create Pod

Log in to RunPod → Pods → + Deploy
GPU: RTX 6000 Ada or RTX PRO 6000
Template: runpod/comfyui:latest
Volume Disk: 100GB (the models are large)
Deploy

2. Download models

Open the Web Terminal and run:

pip install huggingface_hub

cd /workspace/ComfyUI/models

# Diffusion Model（約40GB、時間かかる）
cd diffusion_models
wget https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_layered_bf16.safetensors

# VAE
cd ../vae
wget https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/resolve/main/split_files/vae/qwen_image_layered_vae.safetensors

# Text Encoder
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors

# LoRA（tori29umai版）
cd ../loras
wget https://huggingface.co/tori29umai/Qwen-Image-Layered/resolve/main/QIL_face_parts_V3_dim16_1e-3-000056.safetensors

3. Load the workflow

If the note article by tori29umai includes a workflow image (PNG with embedded JSON), the most reliable method is to drag-and-drop it into ComfyUI.

Basic graph if building manually:

[Load Diffusion Model] qwen_image_layered_bf16
       ↓
[Load LoRA] QIL_face_parts_V3...
       ↓
[Load VAE] qwen_image_layered_vae
       ↓
[Load CLIP] qwen_2.5_vl_7b_fp8_scaled
       ↓
[Load Image] 入力画像
       ↓
[CLIP Text Encode] プロンプト（分離したいパーツを指定）
       ↓
[Sampler]
       ↓
[Save Image]

4. Prompt example

To separate facial parts:

split eyes, split mouth, split nose, face parts separated

Check the distribution page for concrete trigger words.

Differences between Qwen-Image-Edit and Layered

	Qwen-Image-Edit	Qwen-Image-Layered
Use case	Image editing (inpainting, etc.)	Layer-separated generation
Output	Single image	Multiple transparent layers
Model size	~20GB (FP8)	~40GB (BF16)
Required VRAM	24GB+	48GB+

They serve different purposes, so it isn’t about which one is better. Still, can anyone run this locally? Maybe on a Mac Studio?