Tech10 min read

Qwen Max vs Claude vs Codex writing Anima prompts, tested with a 3-character LoRA: 60 images, no same-family advantage

IkesanContents

Anima (Anima-Base) is a Qwen-DiT model and its text encoder is Qwen3. If an LLM rewrites my Japanese scene briefs into English prompts before generation, shouldn’t a Qwen LLM have a vocabulary affinity that Claude or Codex lack? That hypothesis came out of a chat, and since I already have a merged 3-character LoRA (trio_v2), I put it to a controlled test.

The contenders are Qwen3.7 Max, Claude, and Codex. Each got the same Japanese scene brief, wrote an English prompt for Anima, and the outputs were generated with identical settings and per-scene fixed seeds. Instead of judging by feel, I scored 60 images across 10 scenes × 3 LLMs × 2 brief styles.

Test environment

ItemValue
Generation machineM1 Max 64GB, local ComfyUI
Base modelAnima-Base v1.0
Character LoRAanima-trio-v2 ep143 (3 characters in one LoRA, weight 1.0)
Common generation settingsTurbo LoRA 8-step / er_sde / simple / cfg 1.0 / 1024×1024
SeedsFixed per scene (same seed across all LLMs and brief variants)
Conversion LLMsQwen3.7 Max (OpenAI-compatible API via ModelScope) / Claude (claude -p) / Codex (codex exec)
Scenes10 (2 solo, 4 two-character, 4 three-character)
Briefs2 variants (freeform / strict formatter, described below)

The generation side reuses the proven line from the trio LoRA article with no generation-side settings changed. Only the conversion LLM and the brief vary.

How the experiment is wired

The flow is simple. The same Japanese instruction plus the same brief goes to each of the three LLMs, which must return only {"positive": "...", "negative": "..."} as JSON. That goes to ComfyUI, and the resulting image is checked element by element against the scene instruction.

flowchart TD
    A[10 Japanese scene briefs] --> B[Embedded into a shared brief]
    B --> C1[Qwen3.7 Max<br/>OpenAI-compatible API]
    B --> C2[Claude<br/>claude -p]
    B --> C3[Codex<br/>codex exec]
    C1 --> D[JSON with<br/>positive / negative]
    C2 --> D
    C3 --> D
    D --> E[ComfyUI generation<br/>shared settings, per-scene seed]
    E --> F[Scored element by element<br/>against training-set references]

A few fairness details mattered.

  • Claude runs via claude -p in a neutral directory outside the project, so it cannot see my working notes or the trio LoRA know-how in CLAUDE.md. Otherwise Claude alone would be cheating
  • Codex gets the prompt over stdin via codex exec
  • Each LLM also writes its own negative prompt, so negative design counts as part of the comparison

The first brief (freeform) only teaches the model family and the canonical look of the three characters. Structure and phrasing are left to each LLM. The scenes were designed to cover the scoring axes I care about with multi-character LoRAs.

IDInstructionDifficulty axis
s01Kana solo, classroom, chin on hand, gazing out the window, sailor uniformGaze
s02Kei left, Koharu right, holding hands, casual dressesLeft/right + contact
s03Three on a bench, only center Kana holds two soft-serve conesExclusive prop
s04Kana × Kei hug with height gap, Kana on tiptoes looking upContact + height
s05Koharu sweeping, Kana sneaking up from behindRoles + front/behind
s06Summer festival in yukata, only Koharu holds a goldfish bagExclusive prop
s07Kei solo, rain, blue umbrella, looking backProp
s08Kana right, Koharu left, back to back with crossed armsLeft/right + contact
s09All three jumping in uniforms, all smilingSame action ×3
s10Kana and Koharu play a game, only Kei watches empty-handed from behindRoles + exclusivity

Scoring compares each image against the real training-set references. A prompt that reads correctly but doesn’t reach the image counts as a loss.

Easy scenes produce no separation

Solo scenes and simple left/right placement were cleared by all three systems. s07 (Kei solo, rain, blue umbrella, looking back) comes out nearly identical. Left to right, Qwen, Claude, Codex.

s07 by Qwen. Kei in the rain with a blue umbrella, looking back. Blonde, blue ribbon, uniform s07 by Claude. Nearly the same composition s07 by Codex. Umbrella, rain and the look-back all present

s08 (Kana right, Koharu left, crossed arms, annoyed, white background) also passed everywhere; only the literal back-to-back pose degrades into standing side by side for every system.

s08 by Qwen. Koharu left, Kana right, crossed arms, annoyed. Placement as instructed s08 by Claude. Same placement and expressions s08 by Codex. Same placement. All three weaken back-to-back into side-by-side

s02 passed too, with the differences only in how each LLM fills unspecified space. Qwen invents a street background, Claude keeps it white, Codex colors the dresses.

s02 by Qwen. Kei left, Koharu right holding hands, with an uninstructed street background s02 by Claude. Same layout on a plain background s02 by Codex. Same layout, dresses in light blue and pink

The first real gap appeared in s01. Only Claude got “gazing out the window” onto the canvas, because it translated the situation into composition tags, adding profile, looking away. Qwen and Codex wrote looking out the window literally and got a girl facing the viewer. The gaze only reached the image once the situation had been translated down into a composition tag the 0.6B encoder could act on.

s01 by Claude. Profile view actually gazing out the window s01 by Qwen. Chin rest and classroom correct, but she faces the viewer

Hard scenes expose each LLM’s habits

Two-character contact with a height gap starts separating the field. In s04 (Kana × Kei hug), the Qwen version lost Kei entirely — both girls came out brown-haired, effectively two Kanas hugging.

s04 by Qwen. Both girls brown-haired, Kei gone, and the tall one wears the side ponytail so the height relation flipped too s04 by Claude. Blonde Kei bends down to hug Kana, who looks up. As instructed

The prompts explain it. Qwen wrote a flat 2girls, kanachan, keichan, hugging, ... with thick pose clauses and zero per-character appearance description. Claude wrote kanachan is a petite girl with brown hair, side ponytail, ...; keichan is a tall girl with long blonde hair, ... — semicolon-separated per-character description blocks, the exact structure the trio article landed on for 2–3 characters, reinvented by Claude without being told.

s04 by Codex. Kei survives, hug and height gap present, the upward gaze is weak

s05 (Koharu sweeping, Kana sneaking up from behind) split on role assignment. Only Claude kept the cleaning tool exclusive to Koharu — though what she holds is a sudsy deck brush rather than the instructed broom. Qwen had Koharu with a deck brush and Kana with a bamboo broom, and Codex swapped the roles so Kana holds the broom. Nobody got the sneak — all six variants across the experiment place them side by side.

s05 by Claude. Only Koharu holds a cleaning tool — a deck brush rather than a broom — while Kana raises both hands, but they stand side by side instead of sneaking s05 by Qwen. Both girls hold cleaning tools, a deck brush and a bamboo broom; the roles dissolved s05 by Codex. Kana holds the broom and Koharu raises a hand; roles swapped

With three characters, identity loss begins. In s09 (all three jumping, same action), the Qwen version was the only one with a blonde girl left — an imperfect Kei with shortened hair, a red-drifted ribbon and Kana’s ahoge on her head — while Claude’s and Codex’s versions duplicated black-haired Koharu instead. The three prompts are nearly identical trigger-only lines, so this is closer to a coin flip than a prompt-quality verdict.

s09 by Qwen. A blonde girl remains in the jump, but with shortened hair, a red ribbon and Kana's ahoge — not a fully on-model Kei s09 by Codex. Two short-black-haired girls flank Kana; Kei is gone

In s03 (only center Kana holds two cones), Claude’s version was the one that broke — Kana turned black-haired and Kei picked up a stray ahoge, while Qwen and Codex kept all three intact. The same LLM wins one scene and collapses in another under the same brief.

s03 by Claude, freeform. Center Kana has turned black-haired and Kei's color drifts orange with a migrated ahoge

s10 (Kana and Koharu playing, only Kei watching empty-handed from behind) is the hardest role-assignment scene, and in freeform only Codex held the structure — the assignment is right, though its Kei is cropped at the head, so even this one isn’t a clean image. Claude’s version duplicated Koharu into a fourth girl and handed three controllers out.

s10 by Claude, freeform. Koharu is duplicated into four girls total, three of them holding controllers s10 by Codex, freeform. Kana and Koharu play, Kei watches empty-handed from behind. Roles perfect, though Kei's head is cropped

A strict formatter brief with count locks

Round two reruns all 10 scenes with a brief that demotes the LLM from author to formatter. It lists hard prohibitions and forces a character-count lock plus negative entries for absent characters.

You are not a prompt author but a validating formatter.
Creation, improvement, summarizing, paraphrasing and completion are forbidden.

(model info identical to the freeform brief)

Prohibitions. One violation makes the output a failure:
- Do not add characters not present in the scene instruction
- Do not change the specified number of people
- Do not alter or omit specified left/right or front/back placement
- Do not alter or omit specified clothing, props or background
- Do not summarize, shorten or omit elements written in the scene
- Do not add or complete elements to make it "more natural" or "better"
- Do not write triggers of absent characters into the positive

The positive must include:
- A count lock (e.g. exactly two girls, only kanachan and keichan, no other people)
- Every element written in the scene instruction

The negative must include:
- Bans on wrong counts and extra people
- Bans on text, speech bubbles, sound effects
- Trigger words of characters absent from the scene

This worked clearly for identity retention. Claude’s collapsed s03 recovered on the same seed.

s03 by Claude, freeform (repeated). Kana black-haired s03 by Claude, strict brief. Same seed, Kana back as a brown side-ponytail girl, all three intact, troubled face and both cones present

s10 improved across the board — every system now reproduces the structure of Kana and Koharu playing in front with Kei behind the sofa, and Claude’s four-girl output shrinks back to three.

s10 by Claude, strict brief. Kana and Koharu with controllers in front, Kei behind, all three on-model s10 by Codex, strict brief. Same role structure, and Kei's head stays in frame this time

Two side effects showed up.

First, creativity disappears. The s01 window gaze that Claude had won in freeform was lost by all three systems under the strict brief — the prohibitions suppress clever composition-tag translation like profile, leaving a literal enumeration.

Second, literalism backfires. In s05 the Qwen and Codex versions associated “cleaning” with maid outfits, changing clothes nobody specified.

s05 by Qwen, strict brief. Both girls now wear maid outfits with headdresses, still two brooms

And s09 stays unstable under the strict brief too. Claude’s version brings back a blonde, blue-eyed, blue-ribboned girl — but she wears Kana’s ahoge over shortened hair, closer to a blonde Kana than a full Kei. Qwen’s version loses the blonde it had kept in freeform, and Codex stays broken. None of the six s09 variants produced a fully on-model Kei; keeping three trigger-only characters distinct while they perform the same action flips on tiny word-order changes.

s09 by Claude, strict brief. A blonde, blue-eyed girl returns, but with Kana's ahoge and shortened hair — not a fully on-model Kei s09 by Qwen, strict brief. The blonde kept in freeform now vanishes into a second Kana

Two failure modes survive any prompt

Across all 60 images, two kinds of breakage never got fixed by any LLM or either brief, so I attribute them to the Anima + trio LoRA side.

The first is exclusive possession. “Only one girl holds it” or “only one girl doesn’t” lost every single time — s03’s cones, s06’s goldfish bag and s10’s controllers all get distributed to everyone. In s06 the bag even migrates to the wrong girl.

s06 by Claude, strict brief. The goldfish bag meant only for Koharu is held by all three s06 by Qwen, strict brief. The bag migrates to Kana, Kei holds a scoop, and Kei's eyes drift green

Even in the much-improved strict s10, the final step — Kei staying empty-handed — broke in all three systems, putting a controller in the spectator’s hands. The freeform s06 runs all handed bags to everyone as well. This matches the trio article’s cleaning-scene note that per-character action assignment is still weak with three characters at once.

s06 by Qwen, freeform. Kei and Kana hold bags while Koharu holds a scoop; ownership inverted s06 by Claude, freeform. All three hold bags s06 by Codex, freeform. All three hold bags

The second is front/behind blocking. The s05 sneak-up-from-behind came out side-by-side in all six variants, and s08’s back-to-back weakens the same way. This lines up with the trio article’s boundary — stacked, separable contact works; spatial arrangements beyond it don’t. No amount of prompt rewording moved these two failure modes, which leaves ControlNet and inpainting as the remaining route.

No Qwen-to-Qwen affinity showed up

Across 20 scene conversions, not once did a Qwen3.7 Max prompt succeed where Claude’s and Codex’s failed in a way traceable to shared-family vocabulary. Every gap traced back to prompt structure — per-character description blocks, tag-level translation of composition, count locks — all of which any LLM can write.

The receiving text encoder being a 0.6B Qwen3 is, I think, the whole story. However Qwen-flavored the phrasing, a 0.6B encoder has no capacity to reward it; what reaches the image is only whether the prompt was short, concrete and structurally clean.

SystemHabit
Qwen3.7 MaxWrites the leanest tag lines. Clean images, but drops character descriptions and fine points like gaze and roles
ClaudeBest at rescuing hard instructions (gaze translation, role retention), but also produced the one identity collapse and a double-JSON output once. Benefits most from the strict brief
CodexMost stable character identity, the only freeform s10 clear. Never spectacular, never disastrous