Tech 13 min read

Two-character LoRA without bleed or fusion: rank128 + 20 dual images on Anima

IkesanContents

In the previous combined LoRA (v1), I baked Kei-chan and Kana-chan into a single LoRA and had the two of them interact. Side-by-side and solo worked, and hugs, piggyback rides, and princess carries came out too on a lucky seed. But two problems remained.

  • When the two touch, attributes bleed across. Kana’s ahoge lands on Kei (blonde), Kei’s long hair gets pulled onto Kana, and on close contact their bodies fuse
  • Depth and face fidelity hit a ceiling under rank64’s capacity plus the interference of carrying two identities

At the end of v1 I wrote that the fix was to raise rank and untangle the interference. This time I actually do it. I raised rank from 64 to 128 and added 20 two-character training images that v1 didn’t have, then re-baked. The goal is to eliminate the attribute bleed and fusion.

Two changes from v1

Itemv1v2
rank / alpha64 / 64128 / 128
Data96 solo only (Kei 43 + Kana 53)152 (Kei 79 + Kana 53 + 20 two-char)
Recipelr 2e-5 / repeats 2 / ep150Same (lr 2e-5 / repeats 2 / ep150 / save_every 4)
BaseAnima-Base v1.0Same
GenerationTurbo 8 stepsSame (er_sde / simple / cfg1.0)

The real changes are just two—rank and the two-character data—and everything else follows the v1 recipe. The solo v1 setup was already proven, so I wanted to isolate which factor did the work.

How I built the two-character data

A big part of the attribute bleed is that the model never saw images with both characters in the same frame during training. So I prepared 20 two-character images.

TypeCountHow
Composite8Crop matching-pose solo images with a bbox, place them side by side, normalize heights, and merge into one image
AI-generated12Generate images of the two standing together or touching with Gemini / GPT

I also included left/right swaps. The two-character images are kept to 13% of the set. If the ratio is too high, even a solo trigger starts spitting out both characters on its own, so I made the split solo-primary, two-character-secondary. Every image is captioned.

Training ran on RunPod (torch 2.5.1+cu124 / AnimaLoraToolkit). 304 steps/epoch × 150 ep = 11,400 steps; about 4 minutes per epoch, roughly 10 hours total. The procedure is the same flow as the AnimaLoraToolkit + RunPod article.

The ahoge bleed and fusion are gone

The most visible failure in v1 was the attribute bleed where Kana’s ahoge landed on Kei’s (blonde) head. In v2, even when I generate both characters, the ahoge basically does not land on the top of Kei’s head (very rarely a small tuft appears, but the clear bleed seen in v1 is gone). Kana keeps her ahoge and it doesn’t carry over to Kei. The long-hair pull no longer happens either, and even in close contact the two bodies stay separate.

Hug (cheek-to-cheek) ep140. Even in close contact, no ahoge lands on Kei's head and the bodies don't fuse

Solo triggers are healthy too: generating keichan / kanachan separately shows no contamination. Kei doesn’t pick up Kana’s features, nor the reverse, and there’s no face breakdown. The solo characters haven’t degraded at this epoch count, so separation and solo quality hold together.

keichan solo trigger ep140 (blonde, blunt bangs, blue ribbon) kanachan solo trigger ep140 (brown side-ponytail, ahoge)

Contact poses (hug, standing together, piggyback, princess carry) come out as in v1, but the “fusion at the contact point” that lingered in v1 is visibly reduced. The hard part of contact was solved by rank128 plus the two-character data.

Two characters standing together ep140 (left keichan / right kanachan, separation held) Piggyback ep140 (keichan carrying kanachan; no fusion even in close contact)

Lap-sit comes out with Danbooru tags on a lucky seed, not from natural language

The one thing clearly broken in v1 was lap-sit (a seated pose where the legs overlap in front). In v1 I wrote that “the model partly reads kanachan sitting on keichan's lap as ‘both are seated’ and fuses the two seated figures.” Digging into this in v2 confirmed the reading.

  • Natural language (kanachan sitting on keichan's lap) doesn’t produce a lap-sit. The two either sit side by side or fuse
  • The same natural language fails the same way without the LoRA (Anima-Base alone)
  • Adding Danbooru tags (sitting on lap + girl on top) makes seeds that produce a lap-sit start to appear

So lap-sit not coming out from natural language isn’t a LoRA problem—it’s a limitation where the Qwen3 text encoder can’t fully parse the spatial relation of “A sits on B’s lap.” Since it fails on the base alone, there’s nothing the LoRA can fix. This backs up the v1 read that it was a prompt-interpretation failure with room to be worked around.

That said, even with the tags it doesn’t come out on every seed. Lining up four seeds of the ep140 Danbooru-tag lap-sit, two seeds gave a real lap-sit with Kana actually sitting on Kei’s thigh, while the rest dropped to just sitting side by side on chairs. The tags raise the hit rate for lap-sit but don’t guarantee it—it’s seed-gacha territory. Whether a lap-sit comes out is a separate question from whether the two in the resulting image are separated, and the latter holds whether it’s a lap-sit or a side-by-side sit (no fusion, no ahoge bleed).

Lap-sit ep140 (Danbooru tags, lucky seed; kanachan sits on keichan's lap, separation held)

Epoch sweep

With save_every 4 I saved ep4–148 plus the final ep150, then compared the candidate band (ep120 and up) side by side on the same prompt. What I want is the ep where robust two-character separation and solo quality both hold.

Faces and separation don’t break through ep150

Generating Kei / Kana solo, across ep120 through ep150, in every ep I looked at the face stays on-model and doesn’t break. The differences between epochs are about costume drift (blazer or vest, tights or knee-highs), and signs of overfitting (faces drifting off the source, ghosting, fried faces) don’t show up even at the final ep150. The two-character standing shot (front) is also clean in both separation and faces at ep148 and ep150, with no ghosting or cross-contamination.

This is a clear difference from v1. v1, with rank64 and solo data only, drifted faces off the source and mixed in ghosting when baked to ep150 with the two side by side (overfitting). v2, with rank128 plus two-character data, doesn’t show that v1-style flashy overfitting (face breakdown, ghosting) even at the final ep150.

If anything, the hairstyle gets more stable the higher the ep. Below ep140 the hairstyle wobbles by seed, but at ep140 and above it’s nearly locked. The character’s hairstyle is exactly the thing being baked into the trigger, so locking it is the correct behavior. The low-ep state where the hairstyle wobbles is closer to under-baking the character.

Two-character separation is also stable at ep120–140

Lining up lap-sit (Danbooru tags) on a hard seed at ep120, 132, and 140, regardless of whether it became a lap-sit or dropped to a side-by-side sit, all three keep the two separated—no ahoge on Kei and no body fusion. I lined them up expecting separation to get more robust with more epochs, but across this band every ep is equivalent, and separation quality doesn’t clearly rank one ep over another. In other words, separation is already there by ep120 and doesn’t break afterward.

Artifacts are low-probability and not the fault of a specific ep

What surprised me across the sweep was how few artifacts appear. The ahoge carrying over to Kei is also almost nonexistent. Occasionally there are images with leftover ghosts, extra legs, or odd shadows, but these are the kind of breakage that shows up now and then even without overfitting—less that the ep is bad and more that the seed and various settings happened not to line up.

The yardstick for telling them apart is frequency. If it shows up every time, the LoRA or the ep is bad; if it’s low-probability, it’s a matter of things not lining up. Here, breakage is low-probability at every ep, which is the latter. So there’s no ep in the candidate band you could call “broken,” and generating a few and picking the good one is enough.

Real overfitting is judged by “it won’t change even when you specify it”

A locked hairstyle is not overfitting. It’s just coming out as baked. Rigidity becomes a problem when you explicitly ask to change the hairstyle or outfit and it doesn’t change. If you instruct keichan to take a different hairstyle or casual clothes and the low ep follows but the high ep stays in the original form and won’t change, then the high ep is over-baked. I haven’t nailed down this follow-through test itself yet.

The piggyback hand holds through ep140 and starts breaking from ep144

The one breakage on the overfitting side that clearly showed ep dependence was the hand where Kei grips Kana’s thigh in the piggyback. I took the hardest contact (where the hand crosses the leg), ran three seeds, lined up ep140, 144, 148, and 150, and cropped and zoomed into the hand region.

seedep140ep144ep148ep150
7holdsholdsslightly offslightly off
13holdslostlostlost
99holdsholdsholdsslightly off

There are two modes of breakage: fusion where the fingers stick together, and a loss where the hand itself disappears. The loss is the worse one—visually harsher than fusion.

Here’s how it comes out by ep. At ep140 the hand holds on every seed. At ep144 some seeds drop the hand, but no fusion yet. At ep148 images with a vanishing hand clearly appear, and the heavier breakage starts mixing in from here. At ep150 the frequency of vanishing and fusion is highest. Not every seed breaks every time, but both the probability and the severity of breakage rise as the ep goes up. What breaks is always the hardest crossing pose; it doesn’t spread to standing shots, hugs, or solo.

Cropping the piggyback hand region on the same seed (13) makes the difference easy to see. On the left, ep140 has Kei’s hand gripping Kana’s thigh and holding; on the right, ep150 has that same hand gone.

Piggyback hand crop ep140 (seed13, hand grips the thigh and holds) Piggyback hand crop ep150 (seed13, the same hand has vanished = loss)

The sweet spot is around ep140

At ep140 the hairstyle (character identity) locks, and the piggyback hand still holds on every seed. The hand-vanishing breakage starts mixing in from ep144, and ep148/150 raise its frequency and severity. ep150 is the most suspect. Detail rendering feels better the higher the ep, but the hand in hard contact poses worsens in exchange. So the balance point is safest around ep140, and if you use a lot of hard interactions, don’t go above it. If you only use standing shots, hugs, and solo, ep148/150 cause no real harm.

The final ep150 isn’t saved with a _epoch150 suffix; keikana-animabase-v2.safetensors (no suffix) is the weight after the full 150 epochs complete. Because save_every doesn’t divide 150 evenly, the last suffixed epoch is ep148.

Detail holds up close and breaks at a distance

On a white-background turnaround the softness of the detail is hard to see, so at ep140 I generated situations loaded with backgrounds and objects at a larger resolution. A classroom, dipping feet in the water at the poolside, grabbing food on the way home, a water fight cleaning the pool, a rooftop, a summer festival. What came into view was a composition problem. Up close, even the background renders cleanly; pull back (move the camera away to take in the full bodies and a wide background) and the detail goes soft. The two characters’ identity and separation hold at any distance, so this is about the base model’s drawing power, not the LoRA.

Close shots come out stably

Close-to-mid-distance compositions like the classroom, poolside, and pool cleaning render the background with depth and the detail isn’t soft. Maybe thanks to raising the training-data resolution, generating large brings the fine details in. The two characters’ separation (ahoge, face, body) also holds within the scene.

Sitting in a classroom ep140 (window light, blackboard, desk depth; separation held) Dipping feet in the water at the poolside ep140 (water surface and wet rendering don't break; feet are drawn too)

Pool-cleaning water fight ep140 (close. Splashes, sunlight, and the hose grip all hold; the asymmetry of Kei spraying and Kana defending comes through)

The only flaws show up where hands or mouths get into complex interaction with objects. In the grabbing-food scene, depending on the seed, the crepe Kana holds doubles, fuses with her chin (not eating but melting), or her mouth breaks down. The image below is a lucky seed where the crepe and mouth are relatively intact, but the signboard still breaks into fake lettering. Food-mouth interaction breaking is a difficulty common to models including the Illustrious line, and like natural-language lap-sit not working, it’s closer to a limit on the base model and prompt interpretation than overfitting specific to keikana.

Grabbing-food scene ep140 (lucky seed. Background has depth but the signboard text breaks. Crepe-mouth interaction breaks easily by seed)

Pull back and the rendering goes soft

With the same characters and the same ep140, pulling the camera back to fit the full bodies and take in a wide background drops the detail. As faces and hands shrink and the pixels allotted to them decrease, the model rounds the forms toward an average. What breaks is consistently the high-frequency detail of the background.

Shooting the pool cleaning from a distance, the characters themselves come out, but the routing of the hose becomes illogical.

Pool-cleaning composition shot from a distance ep140 (characters come out, but the hose routing breaks illogically)

At the summer festival, the stall signboards break into fake lettering, and the crowd in the back goes flat with faces and bodies melting.

Summer-festival distance shot ep140 (stall signboards in fake lettering, the crowd in back flat and melting)

In the rooftop distance shot, the chain-link mesh in the background breaks as if cracked.

Rooftop distance shot ep140 (the chain-link mesh in the background cracks and breaks)

What makes the difference is composition (camera distance), not the LoRA. The two characters’ identity, separation, and the ahoge split hold even in distance shots, and what breaks is concentrated on the high-frequency detail—backgrounds, small props, crowds—that the model is inherently weak at. The more elements you cram into one image, the fewer pixels each gets and the less the rendering can keep up, is the closest read. If you use keikana, close-to-mid distance is stable, and when you go for a wide distance shot, plan on raising the background detail separately.