RunPod setup for Qwen-Image-Layered LoRA
I wanted to automatically separate facial parts for Live2D, so I looked into a setup for running a LoRA for Qwen-Image-Layered on RunPod. It’s a different model from Qwen-Image-Edit; this one is specialized for “layer separation.”
What is Qwen-Image-Layered?
It’s an image generation model released by Alibaba, notable for being able to directly generate transparent layers. Regular image generation produces a single flat image, but this model outputs layers separated by background, foreground, and parts.
Here we use tori29umai’s LoRA. It’s trained to separate facial parts (eyes, nose, mouth, etc.) into distinct layers. There are reports that Live2D part separation went from 45 minutes to under 2 minutes.
References:
- Distribution of the Qwen-Image-Layered LoRA by tori29umai
- Experiment: facial part separation with Qwen-Image-Layered (by ゆうさく)
Why RunPod?
Qwen-Image-Layered consumes a lot of VRAM.
| Model format | Size | Required VRAM |
|---|---|---|
| BF16 (highest quality) | ~40GB | 48GB+ |
| FP8 (lightweight) | ~20GB | 24GB+ |
A local RTX 4090 (24GB) can’t run the highest-quality variant. Google Colab is also capped at 24GB, which is tight. With RunPod you can rent GPUs with 48GB+ VRAM by the hour.
GPU choice: RTX 6000 Ada vs RTX PRO 6000
Comparison of 48GB-and-up GPUs available on RunPod.
| GPU | VRAM | Architecture | Price guide | Highlights |
|---|---|---|---|---|
| RTX 6000 Ada | 48GB | Ada Lovelace | $0.8–1.2/hour | Minimum to run BF16 build; good value |
| RTX PRO 6000 | 96GB | Blackwell | $1.5–2.0/hour | Latest generation; fastest; ample headroom |
※ Prices are approximate for Community Cloud as of January 2026.
Which should you choose?
Recommendation: RTX 6000 Ada (48GB)
- Easily fits the Qwen-Image-Layered BF16 (~40GB) plus the LoRA
- Cheaper
- Speed is more than practical
Pick RTX PRO 6000 (96GB) when
- You want to finish a lot in a short time
- Speed is the top priority (Blackwell features higher compute)
- You want complex workflows without worrying about VRAM
Required files
Base model (Qwen-Image-Layered)
Use the files split for ComfyUI.
| File | Path | Size |
|---|---|---|
| qwen_image_layered_bf16.safetensors | models/diffusion_models/ | ~40GB |
| qwen_image_layered_vae.safetensors | models/vae/ | - |
| qwen_2.5_vl_7b_fp8_scaled.safetensors | models/text_encoders/ | - |
Download source: Comfy-Org/Qwen-Image-Layered_ComfyUI
LoRA (for facial part separation)
| File | Path |
|---|---|
| QIL_face_parts_V3_dim16_1e-3-000056.safetensors | models/loras/ |
Download source: tori29umai/Qwen-Image-Layered
Setup steps on RunPod
1. Create Pod
- Log in to RunPod → Pods → + Deploy
- GPU: RTX 6000 Ada or RTX PRO 6000
- Template: runpod/comfyui:latest
- Volume Disk: 100GB (the models are large)
- Deploy
2. Download models
Open the Web Terminal and run:
pip install huggingface_hub
cd /workspace/ComfyUI/models
# Diffusion Model(約40GB、時間かかる)
cd diffusion_models
wget https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_layered_bf16.safetensors
# VAE
cd ../vae
wget https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/resolve/main/split_files/vae/qwen_image_layered_vae.safetensors
# Text Encoder
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors
# LoRA(tori29umai版)
cd ../loras
wget https://huggingface.co/tori29umai/Qwen-Image-Layered/resolve/main/QIL_face_parts_V3_dim16_1e-3-000056.safetensors
3. Load the workflow
If the note article by tori29umai includes a workflow image (PNG with embedded JSON), the most reliable method is to drag-and-drop it into ComfyUI.
Basic graph if building manually:
[Load Diffusion Model] qwen_image_layered_bf16
↓
[Load LoRA] QIL_face_parts_V3...
↓
[Load VAE] qwen_image_layered_vae
↓
[Load CLIP] qwen_2.5_vl_7b_fp8_scaled
↓
[Load Image] 入力画像
↓
[CLIP Text Encode] プロンプト(分離したいパーツを指定)
↓
[Sampler]
↓
[Save Image]
4. Prompt example
To separate facial parts:
split eyes, split mouth, split nose, face parts separated
Check the distribution page for concrete trigger words.
Differences between Qwen-Image-Edit and Layered
| Qwen-Image-Edit | Qwen-Image-Layered | |
|---|---|---|
| Use case | Image editing (inpainting, etc.) | Layer-separated generation |
| Output | Single image | Multiple transparent layers |
| Model size | ~20GB (FP8) | ~40GB (BF16) |
| Required VRAM | 24GB+ | 48GB+ |
They serve different purposes, so it isn’t about which one is better. Still, can anyone run this locally? Maybe on a Mac Studio?