Rewriting WAI-Anima Character LoRA Training Captions with Natural Language and Hairstyle Tags
Contents
In the previous WAI-Anima character LoRA run I concluded that side ponytail position couldn’t be controlled because of an architectural bias in Anima base + Qwen3 TE. The captions for the 53 training images had been reused as-is from the dataset cleaned up for WAI-IL, and at the end of that previous article I’d already noticed they didn’t follow Anima’s recommended caption format. So this round started by fixing the captions first.
Rework strategy
The next-step plan listed at the end of the previous article:
- Reorder caption tags to match the Anima recommendation
- Add natural-language sample descriptions that include directional information
- Add rating and quality tags
The “directional info via natural language” part is the core. In test D from the previous article (pure natural language) it didn’t work at inference, but that was about inference — whether Qwen3 TE actually saw the directional concept during training is a separate question. The previous hypothesis was Qwen3 TE never had any material in the captions from which to extract direction information in the first place. This time we kill that hypothesis from the dataset side.
Re-checking the Anima recommended format from primary sources
I summarized the current official guidance from the WAI-Anima v1 page and the Anima preview3-base page on Civitai.
Tag order:
[quality/rating/safety] [1girl/1boy] [character] [series] [artist] [general tags]
Order within each section is free.
The quality prefix is:
masterpiece, best quality, score_7, safe,
Use safe (not the SDXL-era general). Only the score_7 family of score tags is allowed underscores; everything else is space-separated and lowercase.
Natural language is recommended but not required. Anima was trained on Danbooru tags + natural-language captions + the combination of both, so since the training data includes natural-language versions, including them raises resolution.
The official recommended learning rate for a rank-32 LoRA starts at 2e-5. Last time I let AnimaLoraToolkit’s default of 1.0e-4 ride — about 5x higher — and loss progression was fine, so it worked out, but 2e-5〜5e-5 is closer to the actual recommendation.
Decided not to include the year tag
Initially I planned to keep year 2024 in the captions following the previous article’s style, but on re-reading the official documentation:
- year-family tags are optional (not required, not recommended)
- Use case: tags like
year 2024oryear 2010to bias the era’s drawing style - The model is trained with
random tag dropoutso not all tags are required
Putting year 2024 into the training captions would tie kanachan to “art style of the 2024 era,” which risks degrading character reproduction at inference time when that tag isn’t included. If you want to keep character reproduction separable from era, you don’t put it on the training side.
→ Don’t include it in training captions. If style tweaking is wanted, just add it on the inference prompt side.
Hairstyle tag policy: abandoning the “absorbed into kanachan” approach
This is the biggest strategy shift this round.
Up until now in both IL and Anima training, I’d avoided putting hairstyle tags like side ponytail into the captions and let the single kanachan trigger absorb “hairstyle, hair color, eye color, all of it.” That worked well in IL — kanachan alone reliably reproduced the hairstyle (the side ponytail came out without explicitly specifying it).
Last round’s Anima training used the same strategy, but the side ponytail’s left/right position broke. Turning off flip_augment didn’t change anything, and additional probing revealed Anima base itself has a “side ponytail = viewer-left” bias (preview3-base + no LoRA shows the same position).
There are two stuck states:
- Don’t put hairstyle in captions → maintains kanachan absorption but breaks direction control
- Put hairstyle in captions → kanachan’s definition weakens and hair could go scattered, but
left side ponytailmight recover direction control
This time I pick 2. Reasons:
- The IL “absorbed into kanachan” approach was strong, but had a side effect at inference: it was hard to change the hairstyle. Trying
kanachan, twin tailsfor twintails got pulled back to side ponytail. - Promoting hairstyle to independent tags opens the door to variations like
kanachan, semi-long, hair down. - For Anima, since strategy 1 doesn’t get direction working, strategy 2 is worth trying.
- Even if hair scatters, specifying
left side ponytailevery time should restore the original look.
Accepting the trade-off, I switched to learning the hairstyle as an independent tag.
Caption template
I rewrote all 53 captions in this final structure:
masterpiece, best quality, score_7, [safe/sensitive/explicit],
1girl, solo, kanachan,
[2+ sentences of natural language: scene description + bound hair position in image],
[character attribute tags: brown eyes, left side ponytail, ahoge, brown hair, double parted bangs, medium hair, blue scrunchie, (medium breasts)],
[outfit/pose/composition tags],
white background, simple background,
Natural language policy:
- 2+ sentences for every image
- Explicitly note the bound hair’s position in the image (right/left side of the image)
- Avoid danbooru-tag-style
side ponytailphrasing on the natural language side; describe it likebound hair with a blue scrunchieinstead
Example (angry):
masterpiece, best quality, score_7, safe,
1girl, solo, kanachan,
A close-up portrait of a young girl with an angry expression and a slight frown looking directly at the viewer.
Her bound hair with a blue scrunchie is visible on the right side of the image,
and a small antenna of hair rises from the top of her head.
brown eyes, left side ponytail, ahoge, brown hair, double parted bangs, medium hair, blue scrunchie,
angry, frown, looking at viewer, portrait, bare shoulders,
white background, simple background
Bound hair position description split by composition:
| Composition | Description |
|---|---|
| Front-facing standing / portrait | visible on the right side of the image |
| From behind | visible on the left side of the image |
| Left profile (facing viewer’s left) | extends from the back of her head |
| Right profile (facing viewer’s right) | visible behind her head |
All 53 training images have a side ponytail on the character’s left side, so the image-side left/right depends on the composition.
brown eyes is excluded from the 6 images where the face isn’t visible (back / bikini / sportswear_back / fullbody_left_2 / left_back / right_back). medium breasts is added only to nude.txt where the chest is directly visible.
Workflow for rewriting 53 captions
Doing 53 by hand guarantees missed spots, so I followed this flow:
- Combine into a CSV with 5 columns:
filename, image_path, old_caption, new_caption, notes - Convert to XLSX with a Python script, embedding thumbnails in the first column
- Open in Numbers and review images and new captions side by side
- Apply corrections in bulk via script
- Write back to the final txt files
# build_xlsx.py (excerpt)
from openpyxl.drawing.image import Image as XLImage
from PIL import Image as PILImage
for i, row in enumerate(rows, start=2):
ws.cell(row=i, column=2, value=row["filename"])
ws.cell(row=i, column=5, value=row["new_caption"])
img_path = DIR / row["image_path"]
thumb_path = THUMB_DIR / f"{img_path.stem}_thumb.png"
with PILImage.open(img_path) as im:
im.thumbnail((160, 320))
im.save(thumb_path, "PNG")
xl_img = XLImage(str(thumb_path))
xl_img.anchor = f"A{i}"
ws.add_image(xl_img)
ws.row_dimensions[i].height = 130
Exporting CSV without UTF-8 BOM gets garbled in Excel (mistaken as SJIS), so going through XLSX is safer in the end. pip install openpyxl Pillow covers the deps.
Found 2 bugs in the old captions
Mid-rewrite I found 2 cases where the old captions didn’t match the actual images.
| File | Old caption | Actual image |
|---|---|---|
kanachan_bikini.txt | nude | Pink bikini, from behind |
kanachan_nude.txt | bikini, pink bikini, from behind | Fully nude, front-facing |
The captions on these two files were completely swapped. The most recent two training runs were therefore feeding these wrong captions. The bikini concept may have learned “nudity-like rendering” while the nude concept may have learned “pink bikini.” A landmine that got missed in the shadow of the side ponytail direction problem.
The new captions match the actual images. The filenames (bikini.png / nude.png) stay; only the captions are corrected to match reality.
Blazer color recognition drift
For the 5 files blazer_angry / pointing / run / stomach / left_side I had originally written brown blazer, but the actual images are pretty red. Visually reddish brown fits best — pure brown loses the red, red loses the brown — so I unified on reddish brown blazer taking the middle.
Reflected the same wording on the natural-language side (brown school blazer → reddish brown school blazer). Color-name drift is hard to nail down in AI-only conversations, but a human eyeballing the files side-by-side catches it instantly.
Caption diff
Old (angry):
kanachan, 1girl, solo, angry, portrait, front view, white background
New (angry):
masterpiece, best quality, score_7, safe, 1girl, solo, kanachan,
A close-up portrait of a young girl with an angry expression and a slight frown looking directly at the viewer.
Her bound hair with a blue scrunchie is visible on the right side of the image, and a small antenna of hair rises from the top of her head.
brown eyes, left side ponytail, ahoge, brown hair, double parted bangs, medium hair, blue scrunchie,
angry, frown, looking at viewer, portrait, bare shoulders,
white background, simple background
Almost 10x the length, but Anima now receives information through both the natural-language route and the Danbooru-tag route. The captions are now in a state where Qwen3 TE has material to extract direction information from during training.
Side benefit: hairstyle changes might work
Abandoning “absorbed into kanachan” and promoting hairstyle to independent tags opens up some inference-time freedom.
With the IL-trained LoRA, asking for a different hairstyle like kanachan, twin tails got pulled back to side ponytail and broke. Hairstyle had been baked into the kanachan concept, so even with a different hairstyle specified, the priority of base model + original LoRA dragged it back to side ponytail.
In the new captions, left side ponytail is explicitly an independent tag every time, so at inference:
kanachan, semi-long hair, hair down→ hair downkanachan, twin tails→ twintailskanachan, ponytail→ regular ponytail
variations like these might work. The flip side is the risk of kanachan alone not stabilizing on side ponytail. Adding left side ponytail every time should restore the original look, so the operational cost is acceptable.
Next verification points
The training-side prep is done. Remaining adjustments:
keep_tokens: 1→kanachanwas pinned at the front before, but in the Anima recommendation[quality]comes first. Thekeep_tokensvalue needs to be reconsidered.learning_rate: 1.0e-4→ drop to2e-5〜5e-5(closer to official recommendation)flip_augment: falsecontinues (didn’t matter last time but just in case)repeats / epochsunchanged
With everything aligned, retrain and check whether side ponytail position becomes controllable via left side ponytail. The bikini/nude swapped-caption fix may also show up in the learning. That’s the main test.
v1 retrain results
YAML settings:
| Item | Value | Reason |
|---|---|---|
shuffle_caption | false | Avoid breaking the natural-language block order |
keep_tokens | 1 | Effectively meaningless with shuffle off |
learning_rate | 5.0e-5 | Middle of Anima official recommendation (2e-5〜5e-5) |
flip_augment | false | Continued |
sample_every | 1 | Want to observe behavior every epoch |
epochs | 12 | Unchanged |
Training finished in about 40 minutes. The per-epoch sample prompt was kanachan, 1girl, solo, left side ponytail, standing, looking at viewer, white background (still danbooru-tag-leaning, no natural-language directional info).





The result: out of 12 epochs, only 1 epoch (ep6) had the side ponytail come out in the same direction as the training data (viewer-right = character-left). The other 11 epochs went viewer-left (character-right) like before. Even the apparently-correct ep6 is best read as a seed gacha that happened to land — direction control isn’t being held stably.
A couple of other things worth noting:
- Hair color comes out orange-leaning, not the calmer brown of the training material
- In the ep7 sample, the
ahogeat the top of the head turns into something like animal ears (a strange black-outlined structure)
The likely cause for the hair color: by promoting brown hair to an independent tag, Anima base’s interpretation of “brown hair” (somewhat reddish brown to orange) gets pulled in and diverges from the training material’s actual color. In the previous IL training, kanachan absorbed the hair color too, so the trigger word was directly tied to the actual color of the training material. The trade-off of tag independence shows up here.
Caption re-fix: bring color information back into kanachan
Removed brown hair and brown eyes from the captions. A partial revert toward the previous strategy, but the structural hairstyle tags (side ponytail, ahoge, double parted bangs, medium hair) stay independent.
perl -pi -e 's/\bbrown hair, //g; s/\bbrown eyes, //g' *.txt
Policy:
| Category | Examples | Treatment |
|---|---|---|
| Character core | hair color, eye color | Absorb into kanachan (no need to vary at inference) |
| Variable accessories | outfit, expression, props, pose | Independent tags (vary at inference) |
| Structural tags | hair shape | Independent tags (leave room for twintails etc.) |
blue scrunchie also stayed independent, since the scrunchie might be swapped to a different color or to a ribbon.
The tag left side ponytail doesn’t exist on Danbooru
A major discovery here. Searching Danbooru for left-side_ponytail returns 0 hits. What actually exists:
side_ponytail(parent tag)high_side_ponytail/low_side_ponytail(height sub-tags)short_side_ponytail,side_drill(variants)
left side ponytail is a non-existent tag. Anima base hasn’t learned this combination, so it either parses it as separate left + side + ponytail, or it picks up just the side ponytail part and effectively ignores left.
That means the entire “control via direction tag” strategy was built on a false premise. Both the previous IL training and this v1 — from base’s perspective, left side ponytail was just side ponytail + noise. No directional information was ever conveyed.
Fix:
- Tag side:
left side ponytail→side ponytail(use only existing tags) - Direction info: fully delegate to the natural-language side (“visible on the right side of the image”)
v2 training: color tags removed + side ponytail + natural-language direction
Re-fixed all 53 captions:
- Removed
brown hair,brown eyes left side ponytail→side ponytail- Kept the natural-language direction description
Switched the sample prompt from tag-leaning to natural-language + side ponytail:
masterpiece, best quality, safe, 1girl, solo, kanachan,
Her bound hair with a blue scrunchie is visible on the right side of the image.
side ponytail, ahoge, standing, looking at viewer, white background, simple background
YAML uses output_dir: /workspace/output/rework-v2, output_name: kanachan-waianima-rework-v2 to keep it separate from the previous run. Everything else matches v1.
Observations





- baseline (no LoRA, with natural-language direction): side ponytail came out viewer-right (matching the training material). Anima base + natural-language direction works — confirmed. But as a side effect, the subject came out tied up with rope (covered later)
- epoch 1〜 (LoRA on): direction isn’t pinned in one direction like in v1; left/right swap by epoch. No consistency, but it has escaped the “viewer-left locked” state
- Hair color clearly improved: the orange-leaning v1 has shifted to the brown of the training material. Removing
brown hairand letting kanachan absorb the color was correct
baseline ✓ → wobbly direction with LoRA on suggests the LoRA absorbs visual features but doesn’t strongly bind them to natural-language direction tokens. With only 53 samples this might be the capacity limit of a rank-32 LoRA, or possibly a directional-token resolution issue with Qwen3 0.6B TE.
bound hair interpreted as “tied-up subject” by base model
In the v2 baseline sample, the girl came out tied with rope. Anima base likely interpreted the prompt’s Her bound hair not as “bound = tied” + “hair” but as “tied-up (state) subject + hair.”
The rope disappears in epochs with LoRA on (the training material has zero rope, so LoRA cancels it), but it can resurface in the baseline state or when LoRA strength is dropped at inference.
Also bound hair is redundant. ponytail already means “tied hair,” so I’ll replace bound hair with side ponytail to simplify.
perl -pi -e 's/\bHer bound hair\b/Her side ponytail/g; s/\bbound hair\b/side ponytail/g' *.txt
Side benefit: the natural-language side now also contains the Danbooru-tag word side ponytail, strengthening correspondence with the tag list.
v3 training: removing the bound expression
Removed all bound hair references from the v2 captions. Same for the sample prompt:
masterpiece, best quality, safe, 1girl, solo, kanachan,
Her side ponytail with a blue scrunchie is visible on the right side of the image.
side ponytail, ahoge, standing, looking at viewer, white background, simple background
output_dir: /workspace/output/rework-v3, output_name: kanachan-waianima-rework-v3 to keep separate from v2. save_every: 1 so every epoch’s LoRA is preserved (v1/v2 saved every 4 epochs which made comparison harder).
v3 results






The rope binding is completely gone, baseline included, across every epoch. Removing the bound expression had a clear effect. Hair color stayed at the training-material brown that v2 already achieved — it doesn’t drift. The animal-ear-like artifact at the top of the head from v1 ep7 (the strange black-outlined structure) doesn’t appear in v3 either.
Side ponytail direction still varies between epochs, though. Same as v2 — not pinned in one direction, but not stable either.
Bust-up verification on local ComfyUI
The trainer’s built-in sampler with a minimal prompt isn’t a strong test, so I ran ep7 / ep8 / ep9 through a local ComfyUI on M1 Max with a real prompt. Settings: 832×1024, er_sde + simple, 30 steps, CFG 4.0, LoRA strength model=1.0 / clip=0.8, seed 42 fixed.
Positive is the sample prompt above + white collared shirt, red necktie, upper body, looking at viewer, front view. Negative is the standard set rejecting twintails / nsfw / anatomy breaks.



Only ep8 came out with side ponytail on the viewer-right (matching the natural-language spec). ep7 and ep9 fell to the opposite direction (viewer-left). Looking at ep8 alone, character form / hair color / hairstyle / outfit are all reproduced consistently — the headline success criterion is met.
But it’s unclear whether this is a structural advantage of ep8 or a seed-42 gacha hit.
Additional verification on ep8: full body + motion + LoRA strength
Treating ep8 as the candidate, I checked direction reproducibility with standing pose, running pose, and a LoRA-strength sweep. Same ep8 LoRA, seed 42 fixed.
Standing pose (same prompt + same seed, varying only LoRA strength):



| LoRA strength | Direction |
|---|---|
| 0.5 | Viewer-right (matches NL) ✅ |
| 0.7 | Viewer-left ❌ |
| 1.0 | Viewer-left ❌ |
Running pose (added running, dynamic pose, motion blur, action shot):

Running + strength 1.0 — somehow the direction came out viewer-right (correct). With the same strength 1.0, the standing pose fell to viewer-left, but the running pose lands on the correct side. Switching the prompt from standing to running, dynamic pose flips direction without changing seed. So the variables — LoRA strength, seed, prompt — all interact, and you can’t isolate any one of them cleanly.
Hairstyle isn’t baked into kanachan
To delimit what the LoRA actually absorbs, I removed side ponytail and asked for hair down, semi-long hair (with side ponytail, ponytail, blue scrunchie, scrunchie added to negative).

Hair came down cleanly. Ponytail and scrunchie are gone. So:
- ✅ Hairstyle isn’t baked into the kanachan trigger; it fires from the
side ponytailtag + natural language - ✅ Character core (face, hair color, eye color, body) is absorbed into kanachan
- ✅ Hairstyle independence (one of the rework goals) is functioning
The boundary of what the LoRA learned is now clear. kanachan = character’s face, color, basic form. side ponytail and similar tags = hairstyle structure. The former is strongly baked, the latter is switchable via tags.
Anima architecture-specific constraints
While digging in I learned this isn’t an AnimaLoraToolkit or rank-32 problem — it’s a known structural issue across the entire Anima architecture.
The Anima official repository discussion reports it as “LoRA causes strong style dilution / override on Anima”:
| LoRA Weight | Result |
|---|---|
| 0.4–0.7 | Base knowledge (artist tags etc.) is heavily diluted |
| 0.8–1.0 | Base knowledge is essentially zeroed out |
The cause is Anima’s “CLIP-less” design. Instead of SDXL-era CLIP, it uses Qwen3 0.6B TE, and this powerful text encoding overpowers LoRA adaptation. The moment you apply a LoRA, the diverse visual knowledge base had (including directional rendering) starts to fade through “catastrophic forgetting.”
Official recommended mitigations:
- Don’t train the LLM Adapter (the layer between TE and DiT). AnimaLoraToolkit’s defaults already exclude this.
- Lower the learning rate to 1e-5 〜 2e-5 (mine was 5e-5, higher than recommended)
- Push step count to 12,000+ (mine was 636 steps, about 5% of recommended)
- Build in the Differential Output Preservation patch (in development)
This run misses the recommended conditions on learning rate and step count by a large margin. The likely picture: catastrophic forgetting prevented direction information from baking in adequately, the LoRA’s directional bias overrode the NL spec, but that override was inaccurate so it landed on viewer-left.
Conclusion
What was achieved:
- ✅
bound hair→side ponytail: Just changing the natural-language phrasing eliminated the “bound = restrained” misinterpretation - ✅ Removed
brown hair/brown eyes: Color information goes back into kanachan, eliminating the divergence with Anima base’s interpretation that tag independence had created - ✅
left side ponytail→side ponytail: Got rid of a non-existent Danbooru tag - ✅ Hairstyle independence: Confirmed that kanachan +
side ponytailetc. allows hairstyle changes (validated with hair down)
What wasn’t achieved:
- ❌ Side ponytail direction control: All 53 training images are in the same direction, but at LoRA strength 1.0 inference, gacha across epoch / seed / prompt remains
- The realistic operational fallback is strength 0.5, leaning on base + NL for direction (though LoRA features get diluted)
- The proper fix is dropping learning rate to 2e-5 and retraining at 12,000+ steps. About 5 hours and an extra $4
The main lesson: per-character LoRAs for Anima cost a lot more to train than SDXL-family ones. The kanachan trigger can land character likeness on its own, but baking in finer control like direction requires a big budget jump from the SDXL-era reflex (a few hundred steps, rank 32).
For now I’ll operate with ep8 + strength 0.5, and wait for longer training runs and the Differential Output Preservation release.