PLaMo vs LLM-jp vs Sakana Fugu: scratch, fully open, or orchestration

On June 22, PLaMo 3.0 Prime (PFN) and Sakana Fugu (Sakana AI) shipped on the same day.
As headlines they line up as “two new Japanese AI releases.” Open them up, though, and they are building completely different things.
PFN trains the model weights from scratch. Sakana trains no frontier model at all, and built only a small conductor that calls external GPT and Claude models at runtime.

Add NII’s (National Institute of Informatics) LLM-jp and a third approach shows up.
It trains from scratch like PFN, but then publishes everything: the finished weights, the corpus, the training recipe.
Same label, “domestic LLM,” yet one sells a model, one gives it away, and one does not build a model at all. That is how far apart they sit.

When I sorted out Japan’s LLMs before, it came out as a catalog organized by training method.
This time it is not a list of models, but the three companies’ strategies and the tech inside them.

Built from scratch and sold (PLaMo / PFN)

PLaMo 3.0 Prime is PFN’s own model, trained from scratch, sold through an API and on-premises.
The weights are not public. You do not download and run it; you contract with a domestic vendor and use it as a service.

The June 22 GA changed a lot from the March beta.

Item	What’s in the GA
Context length	65,536 → 262,144 tokens (256k). Extended with YaRN (scaling that extrapolates RoPE positional encoding) plus continued pretraining. Not just a wider window; it was retrained on long text
Two response modes	Reasoning (accuracy-first) and Non-reasoning (speed-first). Reasoning ran roughly 2x the RL steps of the beta; set `reasoning_effort` to `medium` to think before answering, or `none` to skip reasoning and reply at once
Safety	Trained on safety data provided by NICT. Claims HELM Safety at or above overseas models, while flagging categories where the non-reasoning side still over-refuses or answers risky prompts
Delivery	OpenAI-compatible PLaMo API plus on-premises. `plamo-2.2-prime` and earlier end on 2026-09-30; `plamo-3.0-prime-beta` ends 2026-07-31

Pricing is usage-based: ¥60 input and ¥250 output per million tokens, about the same tier as GPT-4o mini.
A free plan is still in the works; for now you try it with the ¥1,000 of credit (~10M tokens, one month) you get on signup.
Being OpenAI-compatible, existing code works by swapping the endpoint and the model ID.

from openai import OpenAI

client = OpenAI(
    api_key="<PLAMO_API_KEY>",
    base_url="https://api.platform.preferredai.jp/v1",
)

resp = client.chat.completions.create(
    model="plamo-3.0-prime",
    messages=[{"role": "user", "content": "Summarize this internal policy in 3 lines"}],
    reasoning_effort="none",  # use "medium" to reason
)
print(resp.choices[0].message.content)

Who it fits is narrow: shops that cannot send data outside and want to run long internal documents on-prem.
You choose it less on model accuracy than on where the data sits and what the contract says.

Everything above (scratch training, YaRN, RL) comes from PFN’s own announcements.
The weights and data are closed, so you cannot reproduce and check them from outside. With LLM-jp next, that is exactly reversed.

Built from scratch and fully released (LLM-jp / NII)

LLM-jp is a project led by NII’s Research and Development Center for Large Language Models (LLMC). Like PFN, it trains from scratch. What differs is what happens to the weights afterward.
It releases the weights trained on ~12 trillion tokens under Apache 2.0, as-is. Because the training data mixes in no synthetic output from GPT or Claude, you can clearly trace both what it learned on and its license.

“Fully open” is not just the weights.
The corpus composition, the training recipe, and the evaluation results are all out. A third party can reproduce the same thing, or keep training from it.
Released April 3 were LLM-jp-4 in 8B and 32B-A3B. The 32B-A3B is an MoE: of the full 32B, only 3.8B actually fire per token. You get 32B-class knowledge at 3B-class compute. That is why it runs on a single GPU at home.

To keep Japanese from getting diluted, training over-samples Japanese 4.5x, lifting it from 3.5% to 15.9% of the corpus.
On MT-Bench JA it scores 7.82, above GPT-4o’s 7.29 on some items.
Coming within FY2026: a 32B Dense model and LLM-jp-4 332B-A31B (332B parameters, 31B active), released in stages.

The real payoff of fully open is that the dependency disappears.
It runs locally or on your own server, and further training, or loosening the safety filter, is your call. The logs from running it on my own GPU are in a separate post (→ running LLM-jp-4 on ROCm).
If you would rather use an API, Sakura Internet’s “Sakura AI Engine” carries LLM-jp-3.1, with a free tier up to 3,000 requests a month.

Doesn’t build a model, conducts them (Sakana AI)

Sakana has long stayed out of the race to train a giant model from scratch.
What it keeps doing is combining or reworking existing models. With evolutionary model merging (EvoLLM-JP), it blends the weights of existing models via evolutionary search, with no gradient training, to make a new model. Transformer² reads the task at inference time and dynamically adjusts only the “skill” components it gets by decomposing the weights with SVD (singular value decomposition). March’s Namazu took open models like DeepSeek-V3.1, Llama 3.1 405B, and gpt-oss and post-trained them toward Japanese.

The methods differ each time, but they all touch the weights of existing models.

Fugu doesn’t build a base model

Fugu, which went GA on June 22, sits on that same line and pushes one step further.
It does not build a frontier model here either. What it built is a small “conductor” that routes among external models at runtime, finished with reinforcement learning. That is a different order of effort from training a whole frontier model.

The naming is confusing: Fugu is the service, and also the name of the conductor model itself.
The official page says “Sakana Fugu is itself a language model,” and this Fugu LLM calls the other models in its pool.
What Sakana trained is not a model that generates answers. It is the thing that decides who does what.

It is not routing by fixed if/else; it learned how to orchestrate.
When a request comes in, it first decides whether to answer itself or assemble a team of experts. If it builds a team, it assigns roles, Thinker (strategy), Worker (the actual work), Verifier (checking), and pulls the outputs into one answer. It can also call itself recursively as a Worker.

flowchart TD
  Q[Request] --> F{Conductor Fugu}
  F -->|simple| D[Answer directly]
  F -->|complex| T[Thinker: strategy]
  T --> W[Worker: experts split work]
  W --> V[Verifier: check]
  V -->|insufficient| T
  V -->|done| S[Synthesize one answer]
  D --> A[Response]
  S --> A[Response]

This design rests on two ICLR 2026 papers.
TRINITY (arXiv:2512.04695) is a lightweight coordinator with Thinker/Worker/Verifier roles, optimized by evolutionary search.
Conductor (arXiv:2512.04388) trains a 7B conductor with reinforcement learning so it picks up orchestration patterns, recursion included.
Production Fugu’s exact size is not published, but a 7B-class conductor is what sits underneath.

The models it conducts are public frontier models: GPT-5.5, Claude Opus 4.8 (max), Gemini 3.1 Pro (high).
Fable 5 and Mythos, by contrast, are “not publicly accessible,” so they are not in the pool.
There are two products, Fugu (latency/quality balance) and Fugu Ultra (quality-first, a deeper expert pool); Sakana claims it stands shoulder to shoulder with Fable 5 and Mythos Preview on coding benchmarks like SWE-Bench Pro and LiveCodeBench.

It is MoE raised to the model level

Lined up this way, it looks like MoE (Mixture of Experts).
In MoE, inside one model, a small router called the gate picks experts (FFN, the feed-forward layers) per token and sums their outputs. The LLM-jp-4 32B-A3B from earlier is exactly this MoE, firing 3.8B of its 32B.
Fugu takes that “gate picks experts” and moves it from inside the model to outside. The experts are not slices of FFN but whole models like GPT-5.5 and Claude Opus, and the gate is the trained Fugu conductor. In research terms it is close to Mixture-of-Agents (MoA).

That said, it is not quite the same MoE.

Aspect	MoE (inside a model)	Fugu (across models)
Granularity	per token, every layer	per query or subtask
Combination	weighted sum of activations	Verifier reassembles text outputs
Training	gate and experts trained together	only the conductor is trained; experts fixed

Train only the gate and fix the experts: you could call it MoE at the model level.
The conductor’s quality becomes the whole system’s quality, and you cannot touch the models it calls.

The shape is similar but the goal is the opposite.
MoE’s point is to get the same quality at lighter compute, by firing only the 3.8B it needs instead of every expert. When it hits, it is fast.
Fugu does the reverse: it runs several whole models and adds a verification step. Naturally that is slower and pricier. It is chasing not speed but an answer better than any single model alone.

Line it up against OpenRouter

Fugu often gets lumped in with model routing. Put it next to OpenRouter and you can see what is shared and what is not.
OpenRouter has two routers of different character.
The Auto Router picks the single best model per prompt (via NotDiamond) and hands the whole prompt to it. There is fallback, but that is a switch for availability (5xx, rate limits), not a way to combine models.
The other, the Fusion Router, has several models answer the same prompt in parallel, and a judge reads them all and writes the final answer. It does not merge the outputs token by token; the judge rewrites.

So the idea of “have several models work and pull it into one” already exists, in that Fusion Router.
Fugu’s difference is that it makes even the orchestration part a trained model.
Where Fusion’s “answer in parallel, judge combines” is a fixed procedure, Fugu has the conductor decide, per input, the whole flow of splitting the task, assigning roles, verifying, and recursing when needed.
Roughly: Auto Router just hands off to one model, Fusion has a judge combine parallel answers, and Fugu has a trained conductor build a team to solve it.

So what’s the actual upside?

Honestly, reading this, I kept thinking “wouldn’t OpenRouter do?”
If you just want to use several AIs through one API, OpenRouter’s Auto Router or Fusion Router is enough, cheaper, and you can see inside it.
If Fugu beats that, there is one reason: a trained conductor that splits, verifies, and reassembles a task can produce a better answer than any single model in the pool. If that is actually true.
Sakana claims it matches Fable 5 and Mythos, models that are not even in its pool, on coding benchmarks. Reaching a class above the cards in your hand, through orchestration, is the pitch.
Flip it around: if you do not need that lift, a single top model or OpenRouter is enough. It only pays off on hard, multi-step tasks where you can also swallow the high price and the low reproducibility.

Can you build it yourself, and the cost

Just the framework, you can build with open models.
Put LLM-jp as a domestic-only Worker, line up Gemma and Qwen, write the routing and the judge, and you can reach roughly Fusion Router level by hand.
Where it gets hard is past that. Fugu’s value is less the wiring than the part where the conductor is trained (TRINITY/Conductor). The gap between a hand-written, rule-based orchestration and a learned one shows up from there on.

Watch the order of magnitude on price, too.
Because the insides are overseas frontier models, it is usage-based at $5 input and $30 output per million tokens, an order off PLaMo’s ¥60/¥250. Subscriptions are $20, $100, and $200 a month.
On top of that, which model handled which part and how it got blended changes per input and is not disclosed, so you cannot count on the same answer the way you can with a fixed-model API. Data handling, too, ultimately follows whatever overseas model gets called.

The real risk is when a dependency closes off

This is the biggest risk, I think. Almost all of Fugu’s performance is borrowed from other companies’ closed models.
The Fugu conductor itself is a small model devoted to orchestration. It can answer easy questions on its own, but it does not have the power to solve frontier-class hard problems alone. Most of the quality comes from the GPT-5.5 / Claude Opus 4.8 side.
So the moment a vendor restricts pool access, fallback still works but quality just drops. The conductor only reroutes to whoever is left; it cannot make up for what is gone.

This is not hypothetical.
Fable 5 and Mythos are already out of the pool for the stated reason that they are “not publicly accessible.” The decision not to lend a top model to a competitor’s orchestration is already happening.
An access restriction, a terms change, any one of them, and Fugu’s strength shifts with another company’s circumstances. And the entry price is an order higher than PLaMo’s.

The three side by side

	PLaMo 3.0 Prime	LLM-jp-4	Sakana Fugu
Maker	PFN	NII / LLM-jp	Sakana AI
What they built	weights from scratch	weights from scratch	a conductor over external models
Weights public	no	fully, Apache 2.0	conductor in-house; insides are overseas models
Access	API, on-prem	weight download + API (Sakura)	OpenAI-compatible API, subscription
Stays in Japan	yes, via API/on-prem	fully, locally	no (calls overseas models)
Rough price	¥60 in / ¥250 out (per 1M tok)	weights free; API usage-based	$5 in / $30 out (per 1M tok)
Who it fits	long-document work on-prem	modify, research, stay domestic	top performance with no model of your own

It is not chosen on scores alone.
Cannot send data outside? PLaMo on-prem or LLM-jp locally. Want the weights in hand to fix yourself? LLM-jp. Want only top-tier performance without owning a base model? Fugu. It mostly comes down to those three.

Still, my pick is LLM-jp. It does not build on borrowed models; it stacks everything itself, from the corpus up. That is where I feel the domestic hope is.
Lined up by weight of dependency, the shakiest is Fugu. You pay a lot up front, but the performance comes from other companies’ models, and if those close off you cannot tell what is left. PLaMo is still a single in-house model, so the risk is the simple one: shut down or price hike.
LLM-jp already has its weights in hand, under Apache 2.0. Even if a future version leans closed, what is already out will not disappear. You can fix it yourself, and in that self-built orchestration it can sit right in the middle as a domestic-only Worker.
So what I am watching most this year is how far FY2026’s 332B-A31B can go.