Tech 12 min read

Was Looking for a New WAI-Illustrious Version and Found WAI-Anima Instead

IkesanContents

When I wrote about Anima in February, my takeaway was “true to its ‘preview’ label, there’s almost nothing it can do better than SDXL-based models in practice.” Two months later, I was browsing CivitAI looking for a new version of WAI-Illustrious and noticed a link to something called “WAI-Anima.”

WAI0731, the creator of WAI-Illustrious, had released a derivative model based on Anima. And v1 was published on April 15, 2026 — yesterday.

Curious, I dug in and found the Anima landscape had changed quite a bit since February. I ran some hands-on comparisons too.

What is WAI-Anima?

A derivative model fine-tuned by WAI0731 on top of Anima preview3-base.

ItemDetails
AuthorWAI0731 (same as WAI-Illustrious)
BaseAnima preview3-base
Size3.9 GB
Versionv1 (explicitly noted as exploration stage)
Release date2026-04-15
Available atCivitAI, Tensor.Art

The author writes “still in the exploration stage,” so this is an early evaluation build, not a finished product.

SettingValue
EnvironmentComfyUI or Forge Neo
Steps20—30
CFG4—5
SamplerEuler A Normal or ER SDE BETA
Text Encoderqwen_3_06b_base.safetensors
VAEqwen_image_vae.safetensors

Prompts use Danbooru tag style just like SDXL models, and like WAI-Illustrious, it aims for score-tag-based control.

Hands-on comparison

I compared Anima preview3-base, WAI-Anima v1, and WAI-Illustrious v160 on ComfyUI running on an M1 Max (64 GB unified memory), using the same prompt and same seed.

Generation settings

ItemAnima (shared)WAI-Illustrious
Resolution832x1216832x1216
Steps3025
CFG4.05.0
Samplerer_sdeeuler_ancestral
Schedulersimplekarras
Seed42 (fixed)42 (fixed)
Text EncoderQwen3 0.6BCLIP (built-in)

Settings follow each model’s recommended values.

Test 1: Standing pose (white background)

Prompt: 1girl, solo, long blonde hair, blue eyes, white robe, gold embroidery, capelet, gold sash, long sleeves, long dress, standing, looking at viewer, full body, white background

preview3-base
preview3-base standing
WAI-Anima v1
WAI-Anima standing
WAI-IL v160
WAI-IL standing

All three output an angled pose. Since I didn’t include straight-on in the prompt, not getting a front-facing shot is expected. preview3-base added blue fabric that wasn’t in the prompt at all. WAI-Anima v1 was more faithful to the white + gold color scheme.

WAI-Illustrious delivered its usual precision. It reproduced the white robe with gold sash almost exactly as prompted. The maturity gap in tag control is clear.

Test 2: Dynamic scene (with background)

Prompt: 1girl, solo, long blonde hair, blue eyes, white robe, gold embroidery, capelet, gold sash, long sleeves, long dress, running, wind, hair blowing, dynamic pose, fantasy landscape, castle in background, sunset sky, dramatic clouds, grass field

preview3-base
preview3-base dynamic
WAI-Anima v1
WAI-Anima dynamic
WAI-IL v160
WAI-IL dynamic

This is where the Anima models’ strengths show. Both preview3-base and WAI-Anima produce atmospheric, painterly lighting. The depth in the castle, sunset, and grass field is a step above WAI-Illustrious.

WAI-Illustrious has high costume accuracy but the background is flat and stays in its usual “anime cel shading” mode. The mouth is hidden, and both legs are unnaturally stuck together despite running — the physics break down in dynamic compositions. The Anima models produced a more natural running pose.

Test 3: i2i (WAI-IL to Anima)

What happens if you generate a character with LoRA on WAI-Illustrious and then run it through Anima via img2img? I wanted to see if I could lock down the character with IL’s tag control and layer Anima’s textures on top.

The source image was generated with WAI-Illustrious v160 + Kana-chan LoRA (kanachan-waiv16-05.safetensors, strength_model: 1.0, strength_clip: 0.8). i2i denoise was 0.5.

WAI-IL (source)
Kana-chan source
to preview3-base
p3base i2i
to WAI-Anima v1
WAI-Anima i2i

Character identity is more or less preserved, but running through the Anima models makes her look younger overall. The body proportions haven’t actually changed, but the rounder face and different eye/mouth rendering give a different age impression. Anima’s default style bias kicks in even in i2i.

Looking at the details, there are quite a few differences. The sash gained fringe details and the robe has a more realistic sheen. Hair color shifted subtly — less red than the IL version. The side-ponytail scrunchie was already faint in IL, but in preview3-base it’s nearly gone, and in WAI-Anima it seems to have been re-drawn in a different color.

The background changed too. The IL source has faint circular decorative patterns; preview3-base stripped them to near-blank, while WAI-Anima added slightly more defined patterns. The WAI-Anima version also has a white haze-like effect near the mouth — unclear whether it’s meant to be breath or just an artifact.

At denoise 0.5, the result is less “Anima’s expressiveness layered on top” and more “textures shift subtly + body-type bias gets applied.” There’s room for tuning if you want a pipeline that preserves the character while changing only the texture.

NSFW output tests are in the appendix. I found that behavior changes depending on the nsfw tag, and character consistency varies between models.

Generation speed

ModelStandingDynamic
preview3-base (Anima)275s277s
WAI-Anima v1277s274s
WAI-Illustrious v160217s337s

Anima models are stable at around 275 seconds regardless of composition. WAI-Illustrious is faster on standing poses but took 337 seconds on the dynamic scene.

What the comparison revealed

Tag control is where WAI-Illustrious dominates. It reproduces costumes faithfully, and the character’s face and body type stay consistent across compositions. The Anima models change costume colors on their own and the character can look like a different person from one composition to the next.

For atmosphere and lighting in dynamic scenes, the Anima models win. WAI-Illustrious dynamic output has flat backgrounds and physics breakdowns in the running pose. The Anima models render background depth and light wrapping naturally — they’re better suited for atmospheric single illustrations.

WAI-Anima has better tag fidelity and character consistency than preview3-base. The WAI tuning direction is right, but there’s still a big gap with WAI-Illustrious. It’s v1, so I’m curious to see where it goes.

Changes since February

preview3-base arrived

The official Anima project progressed from preview2 to preview3-base.

  • Significantly more training at 1024 resolution compared to preview2
  • Reinforced training on minor artists (50—100 posts)
  • Step efficiency improvements (RDBT derivative reported 40% step reduction)

There were reports of regressions from preview2 (backgrounds tending to go flat), but in this test the dynamic scene backgrounds were quite good.

Derivative models surged

In February, Anima derivatives basically didn’t exist. As of April, here’s what I could find:

ModelCharacteristics
WAI-Anima v1Tag control reinforcement by WAI0731
CottonAnimaFixed art style via merged style LoRAs
Kirazuri v1.0Trained on 15,420 manually curated images. Strong on 2025/07—2026/03 characters and styles
RDBT p3 v0.24fDMD2 distillation (fast generation with fewer steps)
Cat Tower v0.5Anime-style-focused fine-tune
AnimaYume v0.4Derivative checkpoint
Anima Real Anime CATCross-model between Anima and Illustrious

Small compared to the SDXL ecosystem, but a clear shift from “zero ecosystem” two months ago.

LoRA toolkit appeared

In February’s article I mentioned LoRA training was confirmed working, but tooling was nonexistent. Now AnimaLoraToolkit is available.

  • YAML-configured LoRA/LoKr training
  • Direct output in ComfyUI format (no conversion needed)
  • sd-scripts also has Anima training scripts implemented

Official recommendation is rank 32, learning rate 2e-5 as a baseline. However, including the LLM adapter (the 6-layer Transformer bridging text encoder and DiT) in training tends to cause degradation.

Text encoder upgrade

The “0.6B is fundamentally too limited in expressive power” issue I noted in February got an answer from the community.

A Qwen 3.5 4B text encoder has been released.

ItemQwen3 0.6B (standard)Qwen 3.5 4B (community)
Parameters0.6B4B
ArchitectureTransformerSSM/Attention hybrid
VocabularyStandard248K tokens
Prompt understandingTag-focusedHandles long/complex instructions

The 4B version has an unusual architecture: 24 of 32 layers are SSM (Selective State Space Model) and 8 are Self-Attention, with an Attention layer every 4 layers. Output converts 2560-dimensional hidden states to 1024 dimensions via a trained projection (Linear -> ExpRMSNorm -> SiLU -> Linear) before passing to Anima’s adapter.

Worth trying if the 0.6B limitations bother you, but note that it’s community-made and not officially supported by Anima.

Architecture overview

SDXL and Anima are fundamentally different in structure.

SDXL (Illustrious, NoobAI, etc.)

CLIP Text Encoder -> UNet (3.5B) -> SDXL VAE

UNet-based diffusion model. CLIP handles text embedding, UNet handles denoising. The Danbooru tag culture maps directly onto it.

Anima

Qwen3 TE -> 6-layer Adapter (Self-Attn + Cross-Attn + MLP) -> DiT (2B) -> Qwen Image VAE

DiT (Diffusion Transformer) based. Derived from Cosmos-Predict2. A 6-layer adapter sits between the text encoder and DiT, and artist styles are also interpreted through this adapter.

Parameter count is lower for Anima 2B than SDXL 3.5B, but DiT is ViT-based with more computation per step, so it’s not simply “lighter.” In practice, generation time on M1 Max was about 275 seconds at 832x1216.

Danbooru tag support

Explicitly stated in the official README:

The model is trained on Danbooru-style tags, natural language captions, and combinations of tags and captions.

Trained on tags, natural language, and combinations of both. Tags are space-separated (spaces, not underscores), with only score tags using underscores.

That said, as confirmed in this test, the text encoder’s capacity constraint (0.6B) holds things back. Even precise costume color and composition instructions tend to get reinterpreted.

Does it run locally?

Yes. I ran it without issues on M1 Max (64 GB unified memory).

Requirements

FileSizeLocation
Checkpoint3.9 GBComfyUI/models/diffusion_models/
Qwen3 0.6B Text Encoder-ComfyUI/models/text_encoders/
Qwen Image VAE-ComfyUI/models/vae/

You need all three. Unlike SDXL where you can just drop in a single checkpoint, the setup is more involved.

Measured performance (M1 Max)

ItemValue
Generation time~275s/image (832x1216, 30 steps)
Memory usageNo issues (plenty of headroom within 64 GB)
VRAM requirementOfficial: 8 GB. M1 Max unified memory handles it easily

Slower than SDXL at the same resolution. The DiT architecture’s computation cost is the bottleneck. For low-VRAM setups, DiffSynth-Studio has a low-VRAM script for Anima.

When to use Illustrious vs Anima

Use caseRecommended
Character locking / LoRA workflowsIllustrious
Precise tag controlIllustrious
Batch generationIllustrious (speed difference is significant)
Atmospheric single illustrationsAnima (stronger dynamic scene expression)
Trying new art stylesAnima

Illustrious is fully mature with an overwhelming accumulation of LoRAs, ControlNets, and merge models. There’s no reason to switch your main workflow to Anima right now.

But unlike February, “it’s now in a usable state.” This test confirmed that dynamic scene backgrounds and composition naturalness can hold their own against SDXL. WAI-Anima’s improved tag responsiveness is also heading in the right direction.

Status update from the previous article

February observationApril status
Zero ecosystem7+ derivative models, LoRA toolkit, TE upgrade
Slow inference (10x SDXL on V100)275s/image on M1 Max. Slow but practically usable
Weak text encoder (0.6B)4B version from community
No ControlNet supportStill unresolved
Non-commercial license onlyUnchanged (commercial licensing inquiry channel now open)
Hands collapseNo major collapse in this test
Blurry backgroundsImproved in preview3. Dynamic scene backgrounds look good

Structural positioning

graph TD
    A["Cosmos-Predict2<br/>NVIDIA"] --> B["Anima<br/>CircleStone Labs x Comfy Org"]
    B --> C["preview2"]
    C --> D["preview3-base"]
    D --> E["WAI-Anima v1<br/>WAI0731"]
    D --> F["CottonAnima"]
    D --> G["Kirazuri"]
    D --> H["RDBT p3"]
    C --> I["Cat Tower"]
    C --> J["AnimaYume"]

Appendix: NSFW output test


The images below are blurred since the test outputs may be sensitive. If you want to see the actual results, try them in your own environment.


Without nsfw tag

What happens when you give full-nude instructions without the nsfw tag.

Prompt: 1girl, solo, long blonde hair, blue eyes, completely nude, naked, bare skin, sitting on edge of spring, feet in water, forest, natural light, serene, full body, looking at viewer

preview3-base
preview3-base NSFW
WAI-Anima v1
WAI-Anima NSFW
WAI-IL v160
WAI-IL NSFW

All three models produce nudity, but every one of them defaults to compositions that hide the chest and groin. Without the nsfw tag, they lean toward “nude but not showing.”

With nsfw tag (straight-on)

What happens when you add the nsfw tag and request a front-facing full body.

Prompt: 1girl, solo, long blonde hair, blue eyes, completely nude, naked, bare skin, nsfw, straight-on, full body, standing, white background

WAI-IL v160
WAI-IL direct NSFW
preview3-base
preview3-base direct NSFW
WAI-Anima v1
WAI-Anima direct NSFW

With the nsfw tag, all three models produce front-facing full nudity. Since the same composition hid everything without the tag, the nsfw tag clearly controls the behavior.

The most interesting finding here was character consistency. WAI-Illustrious and WAI-Anima v1 look like “the same character in a different composition” whether clothed or nude. Lining up standing, dynamic, and NSFW outputs, you can recognize them as the same person.

preview3-base, on the other hand, shifts character impression when the composition changes. In the NSFW output especially, the face rendering and body type get pulled in an American comic direction — it feels different from Japanese anime style. This might be due to American comic-style images mixed into the training data.

WAI-Anima v1 tends toward large bust sizes and has a watermark (@Pagex.com) in the bottom right. Likely from the training data.