Tech 9 min read

Running WAI-Anima v1 on an RTX 4060 Laptop via ComfyUI API

IkesanContents

In Was Looking for a New WAI-Illustrious Version and Found WAI-Anima Instead I tested WAI-Anima v1 on an M1 Max (64GB unified memory). Wanted to see if it runs on a Windows machine with an RTX 4060 Laptop GPU (8GB VRAM).

It runs. 55 seconds versus 275 on the M1 Max. CUDA is overwhelmingly faster for DiT inference.
However, trying to be clever and run ComfyUI headlessly via its API from Claude Code led to a spectacular tqdm vs LogInterceptor compatibility disaster. Documenting the whole ordeal here.

Environment

ItemSpec
GPUNVIDIA GeForce RTX 4060 Laptop GPU
VRAM8GB
RAM32GB
OSWindows 11 Home
ComfyUI0.15.0 (Portable)
PyTorch2.10.0+cu128

WAI-Anima v1’s official VRAM requirement is 8GB. Right at the limit.

Downloading Models

Three files are needed. Unlike SDXL-based models where you drop a single checkpoint file, the DiT model, text encoder, and VAE are placed separately.

FileSizeLocationSource
waiANIMA_v10.safetensors3.9GBmodels/diffusion_models/CivitAI
qwen_3_06b_base.safetensors1.2GBmodels/text_encoders/HuggingFace
qwen_image_vae.safetensors243MBmodels/vae/HuggingFace

On HuggingFace’s official Anima repo, the files are inside the split_files/ directory. Watch the URLs.

# TE
https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/text_encoders/qwen_3_06b_base.safetensors

# VAE
https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/vae/qwen_image_vae.safetensors

Pointing to the top-level text_encoder/ or vae/ returns a 404 (you’ll download a 15-byte LFS pointer). Got bitten by this initially — “why is the TE only 15 bytes?”

The Headless ComfyUI API Disaster

The goal was simple: launch ComfyUI in the background, submit workflows via API, and collect results. Full automation from Claude Code.

CLIPLoader type parameter

First error was workflow validation. When loading the Anima text encoder with CLIPLoader, I set type to "anima" and it was rejected.

type: 'anima' not in ['stable_diffusion', 'stable_cascade', 'sd3', ...
'qwen_image', 'hunyuan_image', 'flux2', 'ovis']

The correct value is "qwen_image". Anima’s architecture is Qwen-based internally, so ComfyUI identifies it by that name.

tqdm’s [Errno 22] Invalid argument

The workflow passed validation but the KSampler node threw OSError: [Errno 22] Invalid argument.

File "comfy/k_diffusion/sampling.py", line 1524, in sample_er_sde
    for i in trange(len(sigmas) - 1, disable=disable):
File "tqdm/std.py", line 448, in status_printer
    getattr(sys.stderr, 'flush', lambda: None)()
File "ComfyUI/app/logger.py", line 35, in flush
    super().flush()

ComfyUI’s LogInterceptor inherits from io.TextIOWrapper and replaces sys.stderr. When tqdm initializes its progress bar it calls sys.stderr.flush(), and TextIOWrapper.flush() internally flushes the underlying buffer. With a background launch where the file descriptor is invalid, [Errno 22] fires.

Things I tried (all failed)

#AttemptResult
1TQDM_DISABLE=1 env var to globally disable tqdmNo effect
2cmd start /min for a minimized window with a consoleSame error
3PowerShell Start-Process for an independent console windowSame error
4Added try/except to LogInterceptor.flush()Exception caught but ComfyUI’s execution engine still records it as error
5Rewrote LogInterceptor as duck-typed (dropped TextIOWrapper inheritance)Same error
6Patched tqdm/std.py status_printer directlyException caught but error doesn’t go away
7Added disable=True fallback in comfy/utils.py model_trangeNo change

The key finding: the exception IS caught at the Python level, but ComfyUI’s execution.py keeps recording it as an error. The except Exception as ex at execution.py:602 grabs the traceback via sys.exc_info(), and once an OSError fires during tqdm initialization, its trace propagates through the execution engine.

Solution: just launch it normally

Double-click run_nvidia_gpu.bat to start ComfyUI the normal way. The console window has valid stdout/stderr, so the flush() OSError never occurs. Once it’s running, API workflow submission works fine.

# With ComfyUI already running
curl -X POST http://127.0.0.1:8188/prompt \
  -H "Content-Type: application/json" \
  -d '{"prompt": { ... }}'

Give up on “fully automated background execution” and go with “GUI launch + API control.” The initial launch is manual, but everything after that — workflow submission, result retrieval — runs through the API.

Generation Results

Test 1: Standing pose (white background)

Prompt: 1girl, solo, long blonde hair, blue eyes, white robe, gold embroidery, capelet, gold sash, long sleeves, long dress, standing, looking at viewer, full body, white background

Negative: lowres, bad anatomy, bad hands, text, error, worst quality, low quality

Settings: er_sde / simple / 30steps / CFG 4.0 / seed 42

WAI-Anima v1 standing RTX 4060

White robe with gold embroidery and sash. Quite faithful to the prompt. Same seed and settings as the M1 Max results, but different hardware (CUDA vs MPS) means the output isn’t identical.

Test 2: Dynamic scene (with background)

Same prompt as the previous article for comparison.

Prompt: 1girl, solo, long blonde hair, blue eyes, white robe, gold embroidery, capelet, gold sash, long sleeves, long dress, running, wind, hair blowing, dynamic pose, fantasy landscape, castle in background, sunset sky, dramatic clouds, grass field

Settings: er_sde / simple / 30steps / CFG 4.0 / seed 42

WAI-Anima v1 dynamic scene RTX 4060

Sunset sky, castle, grasslands. The “atmosphere and lighting” strength I noted as Anima’s advantage in the previous article is fully present here. Natural movement in the wind-blown hair and dress, with a sense of depth in the background that SDXL-based models struggle with.

Test 3: Goblin battle (multiple characters)

Testing whether multiple characters can appear in a single image. With WAI-Illustrious, getting multiple goblins to actually produce multiple figures is surprisingly difficult.

Prompt: 1girl, solo, long blonde hair, blue eyes, white robe, gold embroidery, capelet, gold sash, long sleeves, long dress, casting spell, magic circle, holy light, glowing hands, multiple goblins, green skin, small creatures, dark cave, battle scene, dramatic lighting, fantasy, dynamic pose, full body

Settings: er_sde / simple / 30steps / CFG 4.0 / seed 777

WAI-Anima v1 goblin battle RTX 4060

A priestess casting holy light atop a magic circle with multiple goblins actually present around her. Good contrast between the dark cave background and the blue magic light.
The main character is front and center while the goblins are placed as background crowd per the prompt. Anima’s text encoder (Qwen3 0.6B) is only 0.6B but handles “multiple supporting characters” surprisingly well.

Test 4: i2i (standing pose to dynamic pose)

Can the same character be placed in a different composition without LoRA? Used the standing pose from Test 1 as the source image and applied a dynamic scene prompt via i2i.

Source (t2i)
Source
i2i denoise 0.5
i2i 0.5
i2i denoise 0.75
i2i 0.75

The prompt specified running, wind, hair blowing, dynamic pose, fantasy landscape, castle in background, sunset sky but the composition stayed essentially standing. At denoise 0.5, only minor clothing details changed. At 0.75, the eyes got sharper and fists started clenching — less priestess, more Ragnarok Online monk.

Drastically changing the composition via i2i is difficult when the source composition is too strong. Maintaining character consistency while changing poses requires ControlNet or LoRA. Anima currently doesn’t support ControlNet, so LoRA training via AnimaLoraToolkit is effectively the only option.

In Testing Pixel Art Conversion with Qwen Image Edit I built a pixel art conversion pipeline using Illustrious i2i. Based on the results here, Anima’s i2i is weak at style transfer even before considering composition changes. Using Anima i2i for dramatic style transformations like pixel art conversion doesn’t look viable right now.

Generation Speed

ConditionTime
Cold start (models not loaded)~55s
Warm (models loaded)~52s
M1 Max same conditions~275s

The RTX 4060 Laptop (8GB VRAM) was roughly 5x faster than the M1 Max (64GB unified memory). CUDA is overwhelmingly faster for DiT inference.

The near-zero difference between cold and warm runs is due to ComfyUI 0.15.0’s “async weight offloading with 2 streams.” Model VRAM loading and sampling overlap during execution, making load time nearly invisible.

No memory issues at 8GB VRAM. WAI-Anima v1 (3.9GB) + Qwen3 0.6B TE (1.2GB) + VAE (243MB) totals about 5.3GB, fitting within the 8GB budget.

Bonus: fp16_accumulation mode

ComfyUI Portable includes run_nvidia_gpu_fast_fp16_accumulation.bat. The --fast fp16_accumulation option uses FP16 accumulation for slightly less precision but potentially more speed. RTX 40-series cards are strong at FP16, so there might be gains.

Compared normal mode vs fp16_accumulation with the same seeds and settings.

Normal (standing)
Normal standing
fp16_accumulation (standing)
fp16 standing
Normal (dynamic)
Normal dynamic
fp16_accumulation (dynamic)
fp16 dynamic
Normal (goblin)
Normal goblin
fp16_accumulation (goblin)
fp16 goblin
ModeStandingDynamicGoblin
Normal55s52s55s
fp16_accumulation51s47s48s

5-8 second reduction, roughly 10% faster. Same seed produces the same composition, but FP16 rounding differences cause subtle detail variations. Quality degradation is imperceptible to the eye. The speed benefit on the RTX 4060 Laptop was marginal.

ComfyUI API Workflow

The workflow JSON for generating via API without the browser UI.

{
  "1": {
    "class_type": "UNETLoader",
    "inputs": {"unet_name": "waiANIMA_v10.safetensors", "weight_dtype": "default"}
  },
  "2": {
    "class_type": "CLIPLoader",
    "inputs": {"clip_name": "qwen_3_06b_base.safetensors", "type": "qwen_image"}
  },
  "3": {
    "class_type": "VAELoader",
    "inputs": {"vae_name": "qwen_image_vae.safetensors"}
  },
  "4": {
    "class_type": "CLIPTextEncode",
    "inputs": {"text": "your prompt here", "clip": ["2", 0]}
  },
  "5": {
    "class_type": "CLIPTextEncode",
    "inputs": {"text": "negative prompt", "clip": ["2", 0]}
  },
  "6": {
    "class_type": "EmptyLatentImage",
    "inputs": {"width": 832, "height": 1216, "batch_size": 1}
  },
  "7": {
    "class_type": "KSampler",
    "inputs": {
      "model": ["1", 0], "positive": ["4", 0], "negative": ["5", 0],
      "latent_image": ["6", 0], "seed": 42, "steps": 30, "cfg": 4.0,
      "sampler_name": "er_sde", "scheduler": "simple", "denoise": 1.0
    }
  },
  "8": {
    "class_type": "VAEDecode",
    "inputs": {"samples": ["7", 0], "vae": ["3", 0]}
  },
  "9": {
    "class_type": "SaveImage",
    "inputs": {"images": ["8", 0], "filename_prefix": "wai-anima"}
  }
}

The node setup is straightforward. Unlike SDXL’s CheckpointLoader, you load the model, text encoder, and VAE separately with UNETLoader + CLIPLoader + VAELoader. Set CLIPLoader’s type to "qwen_image".


Honestly, I burned over an hour on the tqdm OSError problem.
Seven patches tried and failed makes for a fun article, but launching the bat file normally would have worked on the first try.
Lesson learned: headless execution is not ComfyUI’s intended use case.