Luma AI's Uni-1 unifies understanding and generation in a single Transformer
Contents
Image-generation AI has long been dominated by diffusion models such as Stable Diffusion, DALL-E, and Imagen. Understanding is usually handled by an LLM and generation by a diffusion model. OpenAI’s DALL-E 3 rewrites prompts with GPT-4 before passing them to a diffusion model, and Google’s Imagen 3 uses Gemini for reasoning before it enters the generation pipeline.
On March 5, 2026, Luma AI announced Uni-1, and that split disappears. A single decoder-only autoregressive transformer handles both image understanding and image generation. There is no diffusion model.
Architecture: text and images in the same sequence
Uni-1’s idea is straightforward: tokenize text tokens and image patches into a shared vocabulary and process them as one continuous interleaved sequence.
graph LR
A[Text<br/>tokens] --> C[Tokenization in a<br/>shared vocabulary]
B[Image patches] --> C
C --> D[Decoder-only<br/>Transformer]
D --> E[Text output]
D --> F[Image output]
Unlike a typical image generator that “pulls an image out of noise,” Uni-1 generates tokens one by one, just as an LLM generates text. Both text and images can be used for input and output.
That means structured reasoning can happen not just before generation, but during generation itself. DALL-E 3 does its reasoning in the prompt-rewrite step, while Uni-1 bakes reasoning into the generation process.
Training data
Uni-1 was trained on audio, video, image, language, and spatial-reasoning data at the same time. The model reportedly benefits in both directions: image generation improves visual understanding, and visual understanding improves image generation. Fine-grained understanding of regions, objects, spatial relations, and layout gets better.
Four reasoning capabilities
What separates Uni-1 from older image models is its reasoning ability. The model is described in four categories:
| Reasoning type | Meaning | Example |
|---|---|---|
| Temporal | Keep consistency over time | Generate an aging sequence from one portrait |
| Causal | Understand cause and effect | Car crash -> explosion dynamics |
| Spatial | Fill in a scene in a commonsense way | Complete the rest of a room from a partial view |
| Logical | Break down complex multi-part instructions | ”A person in a blue outfit and red hat standing in front of a green car” |
Benchmark: tops RISEBench
RISEBench (Reasoning-Informed Visual Editing) is a benchmark for reasoning-aware image editing and generation. It evaluates the four categories above.
| Model | Overall score |
|---|---|
| Uni-1 | 0.51 |
| Nano Banana 2 (Google) | Close second |
| GPT Image 1.5 (OpenAI) | Close third |
It also shows nearly Gemini 3 Pro-level object-detection performance on ODinW-13.
Comparison with other models
Uni-1 is strong at following complex instructions. When a prompt includes multiple parts, such as “a person in a blue outfit and red hat standing in front of a green car,” Midjourney and DALL-E 3 tend to drop a detail or two. Uni-1 can decompose the instruction first and then generate the image. On the other hand, Midjourney still wins on pure aesthetic quality.
Imagen 3 depends on external reasoning from Gemini, and DALL-E 3 requires a separate GPT-4 prompt-rewriting step. Uni-1 uses the same weights for both understanding and generation, so there is no information loss between pipeline stages.
Weakness
Autoregressive generation can be slower than diffusion when the output resolution gets very high. Because tokens are generated one at a time, the model gets less efficient as the number of pixels rises. The architecture is improving, though, and the gap is shrinking.
How to use it
There are two ways to use Uni-1: the Luma Agents web UI and the developer-facing REST API.
Luma Agents (web UI)
Luma Agents is the creative AI platform built on Uni-1. You can generate images from a browser prompt, and the system is designed for iterative, multi-turn creative work, not just one-shot image generation.
The flow is simple:
- Sign in at app.lumalabs.ai
- Enter a text prompt to generate an image
- Refine the result with follow-up instructions
- Optionally upload reference images or style images
Luma Agents is not just an image UI. Behind the scenes, it can route requests to more than eight external models, including Ray3.14, Google Veo 3, OpenAI Sora 2, and ElevenLabs. When video or audio is needed, it picks a suitable model automatically. You can also adjust reasoning_effort to trade off speed and quality.
Dream Machine API (REST API)
Developers can use the Dream Machine API to access Uni-1 programmatically. Authentication uses Bearer tokens.
# Get an API key from the Luma dashboard
curl --request POST \
--url https://api.lumalabs.ai/dream-machine/v1/generations/image \
--header 'authorization: Bearer luma-xxxx' \
--header 'content-type: application/json' \
--data '{
"prompt": "A cat wearing a spacesuit on Mars",
"aspect_ratio": "16:9"
}'
The request returns an ID, which you then poll until generation completes. Python and JavaScript SDKs are also available.
Supported features include text-to-image, image editing, style references, and character references. Aspect ratios include 1:1, 3:4, 4:3, 9:16, 16:9, 9:21, and 21:9.
The API requires a Plus plan or higher.
Pricing
Luma AI’s subscription plans are split into four tiers.
| Plan | Monthly | Yearly (per month) | Agents allowance |
|---|---|---|---|
| Plus | $30 | 300/year) | Base quota |
| Pro | $90 | 900/year) | 4x Plus |
| Ultra | $300 | 3,000/year) | 15x Plus |
| Enterprise | Contact sales | Contact sales | Custom |
There is a free trial credit, so you can test it before committing.
On the pricing page, Luma AI also publishes per-image credit costs for other models such as Seedream, Nano Banana, and GPT Image 1.5. Those range from 1 to 60 credits per image depending on resolution, but the credit cost for Uni-1 itself is still not public as of March 2026. Monthly-plan credits do not roll over, and extra credits can be bought for $4 per 1,200 credits and remain valid for 12 months.
Enterprise customers include Publicis Groupe, Serviceplan, Adidas, Mazda, and Humain. Luma highlights a case where a large brand recreated a 20,000.
From diffusion models to Transformers
Uni-1 is not the only one moving this way. Google (Nano Banana Pro) and OpenAI (GPT Image 1.5) are also shifting away from diffusion models and toward Transformer-based designs.
graph TD
A[Traditional pipeline] --> B[Reason with an LLM]
B --> C[Generate with a diffusion model]
D[Uni-1 approach] --> E[One Transformer integrates<br/>reasoning and generation]
style A fill:#f9f,stroke:#333
style D fill:#9ff,stroke:#333
Luma AI’s roadmap says Uni-1 will be extended to video generation, voice agents, and interactive world simulators.