Two-character LoRA without bleed or fusion: rank128 + 20 dual images on Anima
Contents
In the previous combined LoRA (v1), I baked Kei-chan and Kana-chan into a single LoRA and had the two of them interact. Side-by-side and solo worked, and hugs, piggyback rides, and princess carries came out too on a lucky seed. But two problems remained.
- When the two touch, attributes bleed across. Kana’s ahoge lands on Kei (blonde), Kei’s long hair gets pulled onto Kana, and on close contact their bodies fuse
- Depth and face fidelity hit a ceiling under rank64’s capacity plus the interference of carrying two identities
At the end of v1 I wrote that the fix was to raise rank and untangle the interference. This time I actually do it. I raised rank from 64 to 128 and added 20 two-character training images that v1 didn’t have, then re-baked. The goal is to eliminate the attribute bleed and fusion.
Two changes from v1
| Item | v1 | v2 |
|---|---|---|
| rank / alpha | 64 / 64 | 128 / 128 |
| Data | 96 solo only (Kei 43 + Kana 53) | 152 (Kei 79 + Kana 53 + 20 two-char) |
| Recipe | lr 2e-5 / repeats 2 / ep150 | Same (lr 2e-5 / repeats 2 / ep150 / save_every 4) |
| Base | Anima-Base v1.0 | Same |
| Generation | Turbo 8 steps | Same (er_sde / simple / cfg1.0) |
The real changes are just two—rank and the two-character data—and everything else follows the v1 recipe. The solo v1 setup was already proven, so I wanted to isolate which factor did the work.
How I built the two-character data
A big part of the attribute bleed is that the model never saw images with both characters in the same frame during training. So I prepared 20 two-character images.
| Type | Count | How |
|---|---|---|
| Composite | 8 | Crop matching-pose solo images with a bbox, place them side by side, normalize heights, and merge into one image |
| AI-generated | 12 | Generate images of the two standing together or touching with Gemini / GPT |
I also included left/right swaps. The two-character images are kept to 13% of the set. If the ratio is too high, even a solo trigger starts spitting out both characters on its own, so I made the split solo-primary, two-character-secondary. Every image is captioned.
Training ran on RunPod (torch 2.5.1+cu124 / AnimaLoraToolkit). 304 steps/epoch × 150 ep = 11,400 steps; about 4 minutes per epoch, roughly 10 hours total. The procedure is the same flow as the AnimaLoraToolkit + RunPod article.
The ahoge bleed and fusion are gone
The most visible failure in v1 was the attribute bleed where Kana’s ahoge landed on Kei’s (blonde) head. In v2, even when I generate both characters, the ahoge basically does not land on the top of Kei’s head (very rarely a small tuft appears, but the clear bleed seen in v1 is gone). Kana keeps her ahoge and it doesn’t carry over to Kei. The long-hair pull no longer happens either, and even in close contact the two bodies stay separate.
Solo triggers are healthy too: generating keichan / kanachan separately shows no contamination. Kei doesn’t pick up Kana’s features, nor the reverse, and there’s no face breakdown. The solo characters haven’t degraded at this epoch count, so separation and solo quality hold together.
Contact poses (hug, standing together, piggyback, princess carry) come out as in v1, but the “fusion at the contact point” that lingered in v1 is visibly reduced. The hard part of contact was solved by rank128 plus the two-character data.
Lap-sit comes out with Danbooru tags on a lucky seed, not from natural language
The one thing clearly broken in v1 was lap-sit (a seated pose where the legs overlap in front). In v1 I wrote that “the model partly reads kanachan sitting on keichan's lap as ‘both are seated’ and fuses the two seated figures.” Digging into this in v2 confirmed the reading.
- Natural language (
kanachan sitting on keichan's lap) doesn’t produce a lap-sit. The two either sit side by side or fuse - The same natural language fails the same way without the LoRA (Anima-Base alone)
- Adding Danbooru tags (
sitting on lap+girl on top) makes seeds that produce a lap-sit start to appear
So lap-sit not coming out from natural language isn’t a LoRA problem—it’s a limitation where the Qwen3 text encoder can’t fully parse the spatial relation of “A sits on B’s lap.” Since it fails on the base alone, there’s nothing the LoRA can fix. This backs up the v1 read that it was a prompt-interpretation failure with room to be worked around.
That said, even with the tags it doesn’t come out on every seed. Lining up four seeds of the ep140 Danbooru-tag lap-sit, two seeds gave a real lap-sit with Kana actually sitting on Kei’s thigh, while the rest dropped to just sitting side by side on chairs. The tags raise the hit rate for lap-sit but don’t guarantee it—it’s seed-gacha territory. Whether a lap-sit comes out is a separate question from whether the two in the resulting image are separated, and the latter holds whether it’s a lap-sit or a side-by-side sit (no fusion, no ahoge bleed).
Epoch sweep
With save_every 4 I saved ep4–148 plus the final ep150, then compared the candidate band (ep120 and up) side by side on the same prompt. What I want is the ep where robust two-character separation and solo quality both hold.
Faces and separation don’t break through ep150
Generating Kei / Kana solo, across ep120 through ep150, in every ep I looked at the face stays on-model and doesn’t break. The differences between epochs are about costume drift (blazer or vest, tights or knee-highs), and signs of overfitting (faces drifting off the source, ghosting, fried faces) don’t show up even at the final ep150. The two-character standing shot (front) is also clean in both separation and faces at ep148 and ep150, with no ghosting or cross-contamination.
This is a clear difference from v1. v1, with rank64 and solo data only, drifted faces off the source and mixed in ghosting when baked to ep150 with the two side by side (overfitting). v2, with rank128 plus two-character data, doesn’t show that v1-style flashy overfitting (face breakdown, ghosting) even at the final ep150.
If anything, the hairstyle gets more stable the higher the ep. Below ep140 the hairstyle wobbles by seed, but at ep140 and above it’s nearly locked. The character’s hairstyle is exactly the thing being baked into the trigger, so locking it is the correct behavior. The low-ep state where the hairstyle wobbles is closer to under-baking the character.
Two-character separation is also stable at ep120–140
Lining up lap-sit (Danbooru tags) on a hard seed at ep120, 132, and 140, regardless of whether it became a lap-sit or dropped to a side-by-side sit, all three keep the two separated—no ahoge on Kei and no body fusion. I lined them up expecting separation to get more robust with more epochs, but across this band every ep is equivalent, and separation quality doesn’t clearly rank one ep over another. In other words, separation is already there by ep120 and doesn’t break afterward.
Artifacts are low-probability and not the fault of a specific ep
What surprised me across the sweep was how few artifacts appear. The ahoge carrying over to Kei is also almost nonexistent. Occasionally there are images with leftover ghosts, extra legs, or odd shadows, but these are the kind of breakage that shows up now and then even without overfitting—less that the ep is bad and more that the seed and various settings happened not to line up.
The yardstick for telling them apart is frequency. If it shows up every time, the LoRA or the ep is bad; if it’s low-probability, it’s a matter of things not lining up. Here, breakage is low-probability at every ep, which is the latter. So there’s no ep in the candidate band you could call “broken,” and generating a few and picking the good one is enough.
Real overfitting is judged by “it won’t change even when you specify it”
A locked hairstyle is not overfitting. It’s just coming out as baked. Rigidity becomes a problem when you explicitly ask to change the hairstyle or outfit and it doesn’t change. If you instruct keichan to take a different hairstyle or casual clothes and the low ep follows but the high ep stays in the original form and won’t change, then the high ep is over-baked. I haven’t nailed down this follow-through test itself yet.
The piggyback hand holds through ep140 and starts breaking from ep144
The one breakage on the overfitting side that clearly showed ep dependence was the hand where Kei grips Kana’s thigh in the piggyback. I took the hardest contact (where the hand crosses the leg), ran three seeds, lined up ep140, 144, 148, and 150, and cropped and zoomed into the hand region.
| seed | ep140 | ep144 | ep148 | ep150 |
|---|---|---|---|---|
| 7 | holds | holds | slightly off | slightly off |
| 13 | holds | lost | lost | lost |
| 99 | holds | holds | holds | slightly off |
There are two modes of breakage: fusion where the fingers stick together, and a loss where the hand itself disappears. The loss is the worse one—visually harsher than fusion.
Here’s how it comes out by ep. At ep140 the hand holds on every seed. At ep144 some seeds drop the hand, but no fusion yet. At ep148 images with a vanishing hand clearly appear, and the heavier breakage starts mixing in from here. At ep150 the frequency of vanishing and fusion is highest. Not every seed breaks every time, but both the probability and the severity of breakage rise as the ep goes up. What breaks is always the hardest crossing pose; it doesn’t spread to standing shots, hugs, or solo.
Cropping the piggyback hand region on the same seed (13) makes the difference easy to see. On the left, ep140 has Kei’s hand gripping Kana’s thigh and holding; on the right, ep150 has that same hand gone.
The sweet spot is around ep140
At ep140 the hairstyle (character identity) locks, and the piggyback hand still holds on every seed. The hand-vanishing breakage starts mixing in from ep144, and ep148/150 raise its frequency and severity. ep150 is the most suspect. Detail rendering feels better the higher the ep, but the hand in hard contact poses worsens in exchange. So the balance point is safest around ep140, and if you use a lot of hard interactions, don’t go above it. If you only use standing shots, hugs, and solo, ep148/150 cause no real harm.
The final ep150 isn’t saved with a _epoch150 suffix; keikana-animabase-v2.safetensors (no suffix) is the weight after the full 150 epochs complete. Because save_every doesn’t divide 150 evenly, the last suffixed epoch is ep148.
Detail holds up close and breaks at a distance
On a white-background turnaround the softness of the detail is hard to see, so at ep140 I generated situations loaded with backgrounds and objects at a larger resolution. A classroom, dipping feet in the water at the poolside, grabbing food on the way home, a water fight cleaning the pool, a rooftop, a summer festival. What came into view was a composition problem. Up close, even the background renders cleanly; pull back (move the camera away to take in the full bodies and a wide background) and the detail goes soft. The two characters’ identity and separation hold at any distance, so this is about the base model’s drawing power, not the LoRA.
Close shots come out stably
Close-to-mid-distance compositions like the classroom, poolside, and pool cleaning render the background with depth and the detail isn’t soft. Maybe thanks to raising the training-data resolution, generating large brings the fine details in. The two characters’ separation (ahoge, face, body) also holds within the scene.
The only flaws show up where hands or mouths get into complex interaction with objects. In the grabbing-food scene, depending on the seed, the crepe Kana holds doubles, fuses with her chin (not eating but melting), or her mouth breaks down. The image below is a lucky seed where the crepe and mouth are relatively intact, but the signboard still breaks into fake lettering. Food-mouth interaction breaking is a difficulty common to models including the Illustrious line, and like natural-language lap-sit not working, it’s closer to a limit on the base model and prompt interpretation than overfitting specific to keikana.
Pull back and the rendering goes soft
With the same characters and the same ep140, pulling the camera back to fit the full bodies and take in a wide background drops the detail. As faces and hands shrink and the pixels allotted to them decrease, the model rounds the forms toward an average. What breaks is consistently the high-frequency detail of the background.
Shooting the pool cleaning from a distance, the characters themselves come out, but the routing of the hose becomes illogical.
At the summer festival, the stall signboards break into fake lettering, and the crowd in the back goes flat with faces and bodies melting.
In the rooftop distance shot, the chain-link mesh in the background breaks as if cracked.
What makes the difference is composition (camera distance), not the LoRA. The two characters’ identity, separation, and the ahoge split hold even in distance shots, and what breaks is concentrated on the high-frequency detail—backgrounds, small props, crowds—that the model is inherently weak at. The more elements you cram into one image, the fewer pixels each gets and the less the rendering can keep up, is the closest read. If you use keikana, close-to-mid distance is stable, and when you go for a wide distance shot, plan on raising the background detail separately.