Tech 4 min read

RunPod setup for Qwen-Image-Layered LoRA

I wanted to automatically separate facial parts for Live2D, so I looked into a setup for running a LoRA for Qwen-Image-Layered on RunPod. It’s a different model from Qwen-Image-Edit; this one is specialized for “layer separation.”

What is Qwen-Image-Layered?

It’s an image generation model released by Alibaba, notable for being able to directly generate transparent layers. Regular image generation produces a single flat image, but this model outputs layers separated by background, foreground, and parts.

Here we use tori29umai’s LoRA. It’s trained to separate facial parts (eyes, nose, mouth, etc.) into distinct layers. There are reports that Live2D part separation went from 45 minutes to under 2 minutes.

References:

Why RunPod?

Qwen-Image-Layered consumes a lot of VRAM.

Model formatSizeRequired VRAM
BF16 (highest quality)~40GB48GB+
FP8 (lightweight)~20GB24GB+

A local RTX 4090 (24GB) can’t run the highest-quality variant. Google Colab is also capped at 24GB, which is tight. With RunPod you can rent GPUs with 48GB+ VRAM by the hour.

GPU choice: RTX 6000 Ada vs RTX PRO 6000

Comparison of 48GB-and-up GPUs available on RunPod.

GPUVRAMArchitecturePrice guideHighlights
RTX 6000 Ada48GBAda Lovelace$0.8–1.2/hourMinimum to run BF16 build; good value
RTX PRO 600096GBBlackwell$1.5–2.0/hourLatest generation; fastest; ample headroom

※ Prices are approximate for Community Cloud as of January 2026.

Which should you choose?

Recommendation: RTX 6000 Ada (48GB)

  • Easily fits the Qwen-Image-Layered BF16 (~40GB) plus the LoRA
  • Cheaper
  • Speed is more than practical

Pick RTX PRO 6000 (96GB) when

  • You want to finish a lot in a short time
  • Speed is the top priority (Blackwell features higher compute)
  • You want complex workflows without worrying about VRAM

Required files

Base model (Qwen-Image-Layered)

Use the files split for ComfyUI.

FilePathSize
qwen_image_layered_bf16.safetensorsmodels/diffusion_models/~40GB
qwen_image_layered_vae.safetensorsmodels/vae/-
qwen_2.5_vl_7b_fp8_scaled.safetensorsmodels/text_encoders/-

Download source: Comfy-Org/Qwen-Image-Layered_ComfyUI

LoRA (for facial part separation)

FilePath
QIL_face_parts_V3_dim16_1e-3-000056.safetensorsmodels/loras/

Download source: tori29umai/Qwen-Image-Layered

Setup steps on RunPod

1. Create Pod

  1. Log in to RunPod → Pods → + Deploy
  2. GPU: RTX 6000 Ada or RTX PRO 6000
  3. Template: runpod/comfyui:latest
  4. Volume Disk: 100GB (the models are large)
  5. Deploy

2. Download models

Open the Web Terminal and run:

pip install huggingface_hub

cd /workspace/ComfyUI/models

# Diffusion Model(約40GB、時間かかる)
cd diffusion_models
wget https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_layered_bf16.safetensors

# VAE
cd ../vae
wget https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/resolve/main/split_files/vae/qwen_image_layered_vae.safetensors

# Text Encoder
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors

# LoRA(tori29umai版)
cd ../loras
wget https://huggingface.co/tori29umai/Qwen-Image-Layered/resolve/main/QIL_face_parts_V3_dim16_1e-3-000056.safetensors

3. Load the workflow

If the note article by tori29umai includes a workflow image (PNG with embedded JSON), the most reliable method is to drag-and-drop it into ComfyUI.

Basic graph if building manually:

[Load Diffusion Model] qwen_image_layered_bf16

[Load LoRA] QIL_face_parts_V3...

[Load VAE] qwen_image_layered_vae

[Load CLIP] qwen_2.5_vl_7b_fp8_scaled

[Load Image] 入力画像

[CLIP Text Encode] プロンプト(分離したいパーツを指定)

[Sampler]

[Save Image]

4. Prompt example

To separate facial parts:

split eyes, split mouth, split nose, face parts separated

Check the distribution page for concrete trigger words.

Differences between Qwen-Image-Edit and Layered

Qwen-Image-EditQwen-Image-Layered
Use caseImage editing (inpainting, etc.)Layer-separated generation
OutputSingle imageMultiple transparent layers
Model size~20GB (FP8)~40GB (BF16)
Required VRAM24GB+48GB+

They serve different purposes, so it isn’t about which one is better. Still, can anyone run this locally? Maybe on a Mac Studio?

References