Running WAI-Anima v1 on an RTX 4060 Laptop via ComfyUI API
Contents
In Was Looking for a New WAI-Illustrious Version and Found WAI-Anima Instead I tested WAI-Anima v1 on an M1 Max (64GB unified memory). Wanted to see if it runs on a Windows machine with an RTX 4060 Laptop GPU (8GB VRAM).
It runs. 55 seconds versus 275 on the M1 Max. CUDA is overwhelmingly faster for DiT inference.
However, trying to be clever and run ComfyUI headlessly via its API from Claude Code led to a spectacular tqdm vs LogInterceptor compatibility disaster. Documenting the whole ordeal here.
Environment
| Item | Spec |
|---|---|
| GPU | NVIDIA GeForce RTX 4060 Laptop GPU |
| VRAM | 8GB |
| RAM | 32GB |
| OS | Windows 11 Home |
| ComfyUI | 0.15.0 (Portable) |
| PyTorch | 2.10.0+cu128 |
WAI-Anima v1’s official VRAM requirement is 8GB. Right at the limit.
Downloading Models
Three files are needed. Unlike SDXL-based models where you drop a single checkpoint file, the DiT model, text encoder, and VAE are placed separately.
| File | Size | Location | Source |
|---|---|---|---|
| waiANIMA_v10.safetensors | 3.9GB | models/diffusion_models/ | CivitAI |
| qwen_3_06b_base.safetensors | 1.2GB | models/text_encoders/ | HuggingFace |
| qwen_image_vae.safetensors | 243MB | models/vae/ | HuggingFace |
On HuggingFace’s official Anima repo, the files are inside the split_files/ directory. Watch the URLs.
# TE
https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/text_encoders/qwen_3_06b_base.safetensors
# VAE
https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/vae/qwen_image_vae.safetensors
Pointing to the top-level text_encoder/ or vae/ returns a 404 (you’ll download a 15-byte LFS pointer). Got bitten by this initially — “why is the TE only 15 bytes?”
The Headless ComfyUI API Disaster
The goal was simple: launch ComfyUI in the background, submit workflows via API, and collect results. Full automation from Claude Code.
CLIPLoader type parameter
First error was workflow validation. When loading the Anima text encoder with CLIPLoader, I set type to "anima" and it was rejected.
type: 'anima' not in ['stable_diffusion', 'stable_cascade', 'sd3', ...
'qwen_image', 'hunyuan_image', 'flux2', 'ovis']
The correct value is "qwen_image". Anima’s architecture is Qwen-based internally, so ComfyUI identifies it by that name.
tqdm’s [Errno 22] Invalid argument
The workflow passed validation but the KSampler node threw OSError: [Errno 22] Invalid argument.
File "comfy/k_diffusion/sampling.py", line 1524, in sample_er_sde
for i in trange(len(sigmas) - 1, disable=disable):
File "tqdm/std.py", line 448, in status_printer
getattr(sys.stderr, 'flush', lambda: None)()
File "ComfyUI/app/logger.py", line 35, in flush
super().flush()
ComfyUI’s LogInterceptor inherits from io.TextIOWrapper and replaces sys.stderr. When tqdm initializes its progress bar it calls sys.stderr.flush(), and TextIOWrapper.flush() internally flushes the underlying buffer. With a background launch where the file descriptor is invalid, [Errno 22] fires.
Things I tried (all failed)
| # | Attempt | Result |
|---|---|---|
| 1 | TQDM_DISABLE=1 env var to globally disable tqdm | No effect |
| 2 | cmd start /min for a minimized window with a console | Same error |
| 3 | PowerShell Start-Process for an independent console window | Same error |
| 4 | Added try/except to LogInterceptor.flush() | Exception caught but ComfyUI’s execution engine still records it as error |
| 5 | Rewrote LogInterceptor as duck-typed (dropped TextIOWrapper inheritance) | Same error |
| 6 | Patched tqdm/std.py status_printer directly | Exception caught but error doesn’t go away |
| 7 | Added disable=True fallback in comfy/utils.py model_trange | No change |
The key finding: the exception IS caught at the Python level, but ComfyUI’s execution.py keeps recording it as an error. The except Exception as ex at execution.py:602 grabs the traceback via sys.exc_info(), and once an OSError fires during tqdm initialization, its trace propagates through the execution engine.
Solution: just launch it normally
Double-click run_nvidia_gpu.bat to start ComfyUI the normal way. The console window has valid stdout/stderr, so the flush() OSError never occurs. Once it’s running, API workflow submission works fine.
# With ComfyUI already running
curl -X POST http://127.0.0.1:8188/prompt \
-H "Content-Type: application/json" \
-d '{"prompt": { ... }}'
Give up on “fully automated background execution” and go with “GUI launch + API control.” The initial launch is manual, but everything after that — workflow submission, result retrieval — runs through the API.
Generation Results
Test 1: Standing pose (white background)
Prompt: 1girl, solo, long blonde hair, blue eyes, white robe, gold embroidery, capelet, gold sash, long sleeves, long dress, standing, looking at viewer, full body, white background
Negative: lowres, bad anatomy, bad hands, text, error, worst quality, low quality
Settings: er_sde / simple / 30steps / CFG 4.0 / seed 42
White robe with gold embroidery and sash. Quite faithful to the prompt. Same seed and settings as the M1 Max results, but different hardware (CUDA vs MPS) means the output isn’t identical.
Test 2: Dynamic scene (with background)
Same prompt as the previous article for comparison.
Prompt: 1girl, solo, long blonde hair, blue eyes, white robe, gold embroidery, capelet, gold sash, long sleeves, long dress, running, wind, hair blowing, dynamic pose, fantasy landscape, castle in background, sunset sky, dramatic clouds, grass field
Settings: er_sde / simple / 30steps / CFG 4.0 / seed 42
Sunset sky, castle, grasslands. The “atmosphere and lighting” strength I noted as Anima’s advantage in the previous article is fully present here. Natural movement in the wind-blown hair and dress, with a sense of depth in the background that SDXL-based models struggle with.
Test 3: Goblin battle (multiple characters)
Testing whether multiple characters can appear in a single image. With WAI-Illustrious, getting multiple goblins to actually produce multiple figures is surprisingly difficult.
Prompt: 1girl, solo, long blonde hair, blue eyes, white robe, gold embroidery, capelet, gold sash, long sleeves, long dress, casting spell, magic circle, holy light, glowing hands, multiple goblins, green skin, small creatures, dark cave, battle scene, dramatic lighting, fantasy, dynamic pose, full body
Settings: er_sde / simple / 30steps / CFG 4.0 / seed 777
A priestess casting holy light atop a magic circle with multiple goblins actually present around her. Good contrast between the dark cave background and the blue magic light.
The main character is front and center while the goblins are placed as background crowd per the prompt. Anima’s text encoder (Qwen3 0.6B) is only 0.6B but handles “multiple supporting characters” surprisingly well.
Test 4: i2i (standing pose to dynamic pose)
Can the same character be placed in a different composition without LoRA? Used the standing pose from Test 1 as the source image and applied a dynamic scene prompt via i2i.



The prompt specified running, wind, hair blowing, dynamic pose, fantasy landscape, castle in background, sunset sky but the composition stayed essentially standing. At denoise 0.5, only minor clothing details changed. At 0.75, the eyes got sharper and fists started clenching — less priestess, more Ragnarok Online monk.
Drastically changing the composition via i2i is difficult when the source composition is too strong. Maintaining character consistency while changing poses requires ControlNet or LoRA. Anima currently doesn’t support ControlNet, so LoRA training via AnimaLoraToolkit is effectively the only option.
In Testing Pixel Art Conversion with Qwen Image Edit I built a pixel art conversion pipeline using Illustrious i2i. Based on the results here, Anima’s i2i is weak at style transfer even before considering composition changes. Using Anima i2i for dramatic style transformations like pixel art conversion doesn’t look viable right now.
Generation Speed
| Condition | Time |
|---|---|
| Cold start (models not loaded) | ~55s |
| Warm (models loaded) | ~52s |
| M1 Max same conditions | ~275s |
The RTX 4060 Laptop (8GB VRAM) was roughly 5x faster than the M1 Max (64GB unified memory). CUDA is overwhelmingly faster for DiT inference.
The near-zero difference between cold and warm runs is due to ComfyUI 0.15.0’s “async weight offloading with 2 streams.” Model VRAM loading and sampling overlap during execution, making load time nearly invisible.
No memory issues at 8GB VRAM. WAI-Anima v1 (3.9GB) + Qwen3 0.6B TE (1.2GB) + VAE (243MB) totals about 5.3GB, fitting within the 8GB budget.
Bonus: fp16_accumulation mode
ComfyUI Portable includes run_nvidia_gpu_fast_fp16_accumulation.bat. The --fast fp16_accumulation option uses FP16 accumulation for slightly less precision but potentially more speed. RTX 40-series cards are strong at FP16, so there might be gains.
Compared normal mode vs fp16_accumulation with the same seeds and settings.






| Mode | Standing | Dynamic | Goblin |
|---|---|---|---|
| Normal | 55s | 52s | 55s |
| fp16_accumulation | 51s | 47s | 48s |
5-8 second reduction, roughly 10% faster. Same seed produces the same composition, but FP16 rounding differences cause subtle detail variations. Quality degradation is imperceptible to the eye. The speed benefit on the RTX 4060 Laptop was marginal.
ComfyUI API Workflow
The workflow JSON for generating via API without the browser UI.
{
"1": {
"class_type": "UNETLoader",
"inputs": {"unet_name": "waiANIMA_v10.safetensors", "weight_dtype": "default"}
},
"2": {
"class_type": "CLIPLoader",
"inputs": {"clip_name": "qwen_3_06b_base.safetensors", "type": "qwen_image"}
},
"3": {
"class_type": "VAELoader",
"inputs": {"vae_name": "qwen_image_vae.safetensors"}
},
"4": {
"class_type": "CLIPTextEncode",
"inputs": {"text": "your prompt here", "clip": ["2", 0]}
},
"5": {
"class_type": "CLIPTextEncode",
"inputs": {"text": "negative prompt", "clip": ["2", 0]}
},
"6": {
"class_type": "EmptyLatentImage",
"inputs": {"width": 832, "height": 1216, "batch_size": 1}
},
"7": {
"class_type": "KSampler",
"inputs": {
"model": ["1", 0], "positive": ["4", 0], "negative": ["5", 0],
"latent_image": ["6", 0], "seed": 42, "steps": 30, "cfg": 4.0,
"sampler_name": "er_sde", "scheduler": "simple", "denoise": 1.0
}
},
"8": {
"class_type": "VAEDecode",
"inputs": {"samples": ["7", 0], "vae": ["3", 0]}
},
"9": {
"class_type": "SaveImage",
"inputs": {"images": ["8", 0], "filename_prefix": "wai-anima"}
}
}
The node setup is straightforward. Unlike SDXL’s CheckpointLoader, you load the model, text encoder, and VAE separately with UNETLoader + CLIPLoader + VAELoader. Set CLIPLoader’s type to "qwen_image".
Honestly, I burned over an hour on the tqdm OSError problem.
Seven patches tried and failed makes for a fun article, but launching the bat file normally would have worked on the first try.
Lesson learned: headless execution is not ComfyUI’s intended use case.