Two characters, one Anima LoRA: hugs hold, lap-sit breaks (stack vs interleave)

Update 2026-06-08: A follow-up where I raised rank to 128 and added 20 two-character training images, clearing the ahoge bleed and body fusion left open here → Two-character LoRA without bleed or fusion: rank128 + 20 dual images on Anima

In the previous post (v3→v4), I pulled the posed full-body shots out of keichan’s LoRA and baked only her identity into a lean dataset — and standing finally produced a clean upright pose. kanachan was already built the same way: a lean dataset, mostly upright turnaround shots. Both were now lean. So I tried it: can these two characters be mixed and baked into a single LoRA?

Side-by-side is easy; the real goal is interaction

If all you want is two characters standing side by side, you don’t really need to merge the LoRAs. Write two triggers and they show up separately; worst case, you inpaint them together afterward.

The real reason to cram two characters into one LoRA is to make them interact. Hugging, carrying, piggyback — images where the two bodies touch. Try this with two separate LoRAs and their models/triggers interfere: arms and faces blend, and it falls apart. So if you want two characters touching in one frame, you have to bake them into a single LoRA from the start.

So the thing I want to confirm: can keichan and kanachan embrace in one image without mixing their features, each staying a distinct person?

Clean side-by-side separation is a given. A merged LoRA only earns its keep once interaction works.

Most multi-concept LoRAs out there are style/franchise LoRAs; two distinct characters plus contact is the hard part. And from what I can see, the vast majority are Illustrious (SDXL) based — I couldn’t find anyone doing multi-character + interaction on Anima (Qwen-Image DiT), at least not that I came across.

Method

Item	Detail
Data	keichan v4 (43 images) + kanachan v2 (53 images) = 96 images. Each image carries only its own trigger (keichan images → `keichan`, kanachan images → `kanachan`, zero trigger contamination)
rank	32 → 64 (capacity for two identities; bleed countermeasure)
Recipe	lr 2e-5, repeats 2, bf16, Anima-Base v1.0, ep150 (swept on even epochs)
What separation relies on	Hair and face only. Both characters wear the same uniform (white shirt + navy pleated skirt + knee-highs + loafers), so the only differences are keichan = blonde, blunt bangs, intake, blue ribbon, blue eyes; kanachan = brown hair, side ponytail, ahoge, brown eyes. I deliberately test a setup where the outfit can’t separate them = bleed-prone
Generation	Always Turbo (speed lora, 8 steps). At full steps (25+), Anima tends to over-render and break, so Turbo actually behaves better. I used Turbo for both real use and evaluation

First: solo separation and side-by-side both pass

Solo triggers: no bleed

Generate keichan and kanachan separately and there’s no cross-feature bleed. keichan doesn’t pick up the ahoge/side ponytail, and kanachan doesn’t pick up the intake/blue ribbon. Even the small accessories separate (red ribbon vs. red necktie, white vs. black knee-highs). Despite identical outfits, the triggers correctly carry the hair/face/accessory differences.

keikana solo triggers (keichan / kanachan)

2girls side-by-side works too

Use both triggers plus a natural-language position cue (“keichan on the left, kanachan on the right”) and the two stand side by side as distinct people. No merging, no swapping. But as noted, this is just the baseline — not where the value is.

keikana 2girls (left: keichan / right: kanachan)

Outfit accessories (bow-tie color, blazer, etc.) do show attribute bleed with a bare prompt, but you can override it on the prompt side by describing each character’s outfit in detail (training won’t fix it, but it’s the kind of problem you can brute-force at generation time).

How far does interaction hold up?

Write the action into the prompt (hugging / carrying / holding hands …) to make the two touch.

Interaction is prompt-driven, but pose-dependent

keikana interaction successes (hug, princess carry, back hug, piggyback)

Hug / holding hands. Front-facing, symmetric contact. Stable (both stay distinct; only the contact point fuses slightly).
Princess carry / piggyback / back hug. The hardest cases, with the most contact points. Pass on a lucky seed. One character carries or holds the other, so the bodies separate and stack, and they stay drawn as distinct people.
Lap-sitting. Seated, the two characters’ legs overlap in front. Legs fuse and it collapses.
Arm over shoulder. One arm around the other’s shoulder. The crossing arms tend to break it.

What decides success: stacking vs. interleaving

Testing it, the line between success and failure came out cleanly. If the limbs separate and stack, it works; if they cross and interleave, it breaks.

Interaction	Structure	Result
Hug / holding hands	Front-facing, symmetric; limbs don’t cross	○ Stable
Piggyback / princess carry / back hug	One carries the other = separated stack	○ With a lucky seed
Arm over shoulder	Arm around the other = arms cross	△–×
Lap-sitting	Seated, legs interleave	× Legs fuse

keikana interaction failures (lap-sit = leg fusion, arm-over-shoulder = arm breakage)

This is the same difficulty every multi-character setup runs into — resolving the limb overlap between two bodies — showing up along the same line for keikana too. The harder the contact (the more the limbs cross), the more it breaks.

Two caveats

Cheek-to-cheek = face fusion. Press two faces together (cheek to cheek) and they tend to fuse at the contact point. But this is mostly low-resolution blur — at higher resolution the two faces separate and read distinctly again.
Breakage ≠ the LoRA’s limit. The lap-sit failure, for example, is partly the model reading the prompt “kanachan sitting on keichan’s lap” as “both are sitting” and fusing two seated bodies. That prompt-interpretation failure is a different thing from the capability bleed in piggyback (hair stretching — the back overlaps the other character’s long-hair zone and gets pulled long), and the former can be rescued with wording.

Best epoch: ep100 (don’t over-bake)

After sweeping, the sweet spot is ep100–120.

ep150 is over-baked. Put the two side by side and the faces drift off the reference, with afterimages and noise creeping in (overfitting).
Face fidelity to the reference peaks at ep100–104; past that it drifts away from the reference and shifts toward the base model’s style.
ep120 is the practical ceiling; the core is around ep100.

“repeats × ep = exposure,” so ep150’s overfitting is simply too much exposure. Adding epochs plateaus on depth — that’s a capacity (rank) and interference problem (more below).

Practical rules

Always Turbo. Full steps over-render and make things worse, and they’re brutally slow.
Both positive and negative + quality tags. masterpiece, best quality, very aesthetic … plus explicit colors (crisp pure white shirt) plus a strong negative (bad hands, fused limbs, grey shirt …) visibly improves shirt whiteness and hand breakage. Clearly better than a bare prompt.
Complex interactions are a seed lottery. Hit and it passes, miss and it breaks. Plan on rolling a few and picking the keeper.

Limits and next steps

The face never becomes the reference itself. What the LoRA bakes in are identity markers (hair color, ribbon, side ponytail, ahoge); the face itself leans toward the base model’s (Anima’s) style. This locks in early at ep30 and stays that way — a base attractor that shows up even in a single LoRA (rank32, zero interference). In keikana, two identities fight over rank64, so the interference makes it stronger than in a single LoRA.
The fix is raising rank. rank96/128 should recover the interference loss and bring it back to single-LoRA level (though the base-derived drift away from the reference itself won’t disappear with rank). This isn’t something exposure (ep/repeats) fixes.
Accessory color bleed. kanachan’s scrunchie looking yellow in a face close-up was a full-step over-render artifact; on Turbo it comes out blue. Shared-attribute bleed can be overridden with a detailed prompt.

Qwen-DiT is strong at multi-subject, but capacity is the bottleneck

Anima = Qwen-Image DiT + Qwen3 text encoder. Its grasp of language structure is strong, so not just “A on the left, B on the right” but action/position cues like hugging and piggyback get through. That’s a big part of why interaction is prompt-drivable. Anima may well be better at multi-character + interaction than SDXL (CLIP) based models.
On the other hand, the bottleneck on depth and face fidelity is rank64’s capacity and the interference between two identities. Increasing exposure (repeats×ep) keeps the plateau shallow; to go deeper you have to raise rank and resolve the interference.