Tech 6 min read

Luma AI's Uni-1 unifies understanding and generation in a single Transformer

IkesanContents

Image-generation AI has long been dominated by diffusion models such as Stable Diffusion, DALL-E, and Imagen. Understanding is usually handled by an LLM and generation by a diffusion model. OpenAI’s DALL-E 3 rewrites prompts with GPT-4 before passing them to a diffusion model, and Google’s Imagen 3 uses Gemini for reasoning before it enters the generation pipeline.

On March 5, 2026, Luma AI announced Uni-1, and that split disappears. A single decoder-only autoregressive transformer handles both image understanding and image generation. There is no diffusion model.

Architecture: text and images in the same sequence

Uni-1’s idea is straightforward: tokenize text tokens and image patches into a shared vocabulary and process them as one continuous interleaved sequence.

graph LR
    A[Text<br/>tokens] --> C[Tokenization in a<br/>shared vocabulary]
    B[Image patches] --> C
    C --> D[Decoder-only<br/>Transformer]
    D --> E[Text output]
    D --> F[Image output]

Unlike a typical image generator that “pulls an image out of noise,” Uni-1 generates tokens one by one, just as an LLM generates text. Both text and images can be used for input and output.

That means structured reasoning can happen not just before generation, but during generation itself. DALL-E 3 does its reasoning in the prompt-rewrite step, while Uni-1 bakes reasoning into the generation process.

Training data

Uni-1 was trained on audio, video, image, language, and spatial-reasoning data at the same time. The model reportedly benefits in both directions: image generation improves visual understanding, and visual understanding improves image generation. Fine-grained understanding of regions, objects, spatial relations, and layout gets better.

Four reasoning capabilities

What separates Uni-1 from older image models is its reasoning ability. The model is described in four categories:

Reasoning typeMeaningExample
TemporalKeep consistency over timeGenerate an aging sequence from one portrait
CausalUnderstand cause and effectCar crash -> explosion dynamics
SpatialFill in a scene in a commonsense wayComplete the rest of a room from a partial view
LogicalBreak down complex multi-part instructions”A person in a blue outfit and red hat standing in front of a green car”

Benchmark: tops RISEBench

RISEBench (Reasoning-Informed Visual Editing) is a benchmark for reasoning-aware image editing and generation. It evaluates the four categories above.

ModelOverall score
Uni-10.51
Nano Banana 2 (Google)Close second
GPT Image 1.5 (OpenAI)Close third

It also shows nearly Gemini 3 Pro-level object-detection performance on ODinW-13.

Comparison with other models

Uni-1 is strong at following complex instructions. When a prompt includes multiple parts, such as “a person in a blue outfit and red hat standing in front of a green car,” Midjourney and DALL-E 3 tend to drop a detail or two. Uni-1 can decompose the instruction first and then generate the image. On the other hand, Midjourney still wins on pure aesthetic quality.

Imagen 3 depends on external reasoning from Gemini, and DALL-E 3 requires a separate GPT-4 prompt-rewriting step. Uni-1 uses the same weights for both understanding and generation, so there is no information loss between pipeline stages.

Weakness

Autoregressive generation can be slower than diffusion when the output resolution gets very high. Because tokens are generated one at a time, the model gets less efficient as the number of pixels rises. The architecture is improving, though, and the gap is shrinking.

How to use it

There are two ways to use Uni-1: the Luma Agents web UI and the developer-facing REST API.

Luma Agents (web UI)

Luma Agents is the creative AI platform built on Uni-1. You can generate images from a browser prompt, and the system is designed for iterative, multi-turn creative work, not just one-shot image generation.

The flow is simple:

  1. Sign in at app.lumalabs.ai
  2. Enter a text prompt to generate an image
  3. Refine the result with follow-up instructions
  4. Optionally upload reference images or style images

Luma Agents is not just an image UI. Behind the scenes, it can route requests to more than eight external models, including Ray3.14, Google Veo 3, OpenAI Sora 2, and ElevenLabs. When video or audio is needed, it picks a suitable model automatically. You can also adjust reasoning_effort to trade off speed and quality.

Dream Machine API (REST API)

Developers can use the Dream Machine API to access Uni-1 programmatically. Authentication uses Bearer tokens.

# Get an API key from the Luma dashboard
curl --request POST \
  --url https://api.lumalabs.ai/dream-machine/v1/generations/image \
  --header 'authorization: Bearer luma-xxxx' \
  --header 'content-type: application/json' \
  --data '{
    "prompt": "A cat wearing a spacesuit on Mars",
    "aspect_ratio": "16:9"
  }'

The request returns an ID, which you then poll until generation completes. Python and JavaScript SDKs are also available.

Supported features include text-to-image, image editing, style references, and character references. Aspect ratios include 1:1, 3:4, 4:3, 9:16, 16:9, 9:21, and 21:9.

The API requires a Plus plan or higher.

Pricing

Luma AI’s subscription plans are split into four tiers.

PlanMonthlyYearly (per month)Agents allowance
Plus$3025(25 (300/year)Base quota
Pro$9075(75 (900/year)4x Plus
Ultra$300250(250 (3,000/year)15x Plus
EnterpriseContact salesContact salesCustom

There is a free trial credit, so you can test it before committing.

On the pricing page, Luma AI also publishes per-image credit costs for other models such as Seedream, Nano Banana, and GPT Image 1.5. Those range from 1 to 60 credits per image depending on resolution, but the credit cost for Uni-1 itself is still not public as of March 2026. Monthly-plan credits do not roll over, and extra credits can be bought for $4 per 1,200 credits and remain valid for 12 months.

Enterprise customers include Publicis Groupe, Serviceplan, Adidas, Mazda, and Humain. Luma highlights a case where a large brand recreated a 15millionannualcampaignpackagein40hoursforunder15 million annual campaign package in 40 hours for under 20,000.

From diffusion models to Transformers

Uni-1 is not the only one moving this way. Google (Nano Banana Pro) and OpenAI (GPT Image 1.5) are also shifting away from diffusion models and toward Transformer-based designs.

graph TD
    A[Traditional pipeline] --> B[Reason with an LLM]
    B --> C[Generate with a diffusion model]
    D[Uni-1 approach] --> E[One Transformer integrates<br/>reasoning and generation]
    style A fill:#f9f,stroke:#333
    style D fill:#9ff,stroke:#333

Luma AI’s roadmap says Uni-1 will be extended to video generation, voice agents, and interactive world simulators.