Un-0 generates images with Kuramoto oscillators, not weighted sums (FID 6.74)

The usual way an image-generation model computes is to pass inputs through many layers of weight matrices (or, for diffusion models, to strip noise away over many steps). Un-0 replaces that computation: it simulates the physics of coupled oscillators falling into synchrony, and makes images from that. Released by Unconventional AI on June 25, 2026, it is — to the company’s knowledge — the first large-scale image generator built with coupled oscillators (the Kuramoto model) at the core of its computation.

It scores FID 6.74 on ImageNet 64×64. That’s roughly where today’s leading methods started out — short of the current SOTA, but, the authors say, the most capable generative model based on a physical-system simulation so far. The catch: right now it runs as a simulation on B200 GPUs, and the physical chip that would prove its headline “1000x less energy than GPUs” does not exist yet.

The code is released under MIT (GitHub unconv-ai/Un-0); the weights are on Hugging Face under CC-BY-NC-4.0 (non-commercial). CIFAR-10 and ImageNet-64 are trained as two separate pipelines.

First, get a feel for the Kuramoto model

The Kuramoto model describes how a large group of “oscillators” fall into step with each other (synchronize). Here, think of an oscillator as a point going around a circle, each turning at its own pace (its natural frequency). Couple these points loosely and they keep spinning independently; strengthen the coupling and they pull on one another until their speeds match and the group’s phases bunch together. That is synchronization. They don’t all have to land on the exact same angle — they can lock while keeping fixed phase offsets.

The figure below is exactly this Kuramoto model. Each dot is an oscillator; the red arrow shows how aligned the whole group is (the order parameter r). Try the slider to change the coupling strength K.

As you raise the coupling strength K, the scattered oscillators start to synchronize all at once past a certain point. The longer the red arrow (order parameter r), the more aligned the group. r near 0 means scattered; r near 1 means full synchronization.

What you’ll notice is that raising K does almost nothing at first, and only past a threshold (the critical coupling) does synchronization start to stand out. This can be described as a kind of phase transition — but not the abrupt change of water freezing at 0°C; rather the type where synchronization grows continuously once you cross the critical point (for all-to-all coupling with a continuous frequency distribution). Fireflies flashing in unison, metronomes on a shared table drifting into sync, the pacemaker cells of the heart — all are explained by the same framework.

For the curious, here’s the equation, but feel free to skip it. The phase $\theta_i$ of oscillator $i$ evolves as

\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N}\sum_{j} \sin(\theta_j - \theta_i)

where $\omega_i$ is the natural frequency and $K$ the coupling strength. The second term on the right is “the force trying to align each phase with its neighbors,” and the larger $K$ is, the more it drives toward synchronization. Note that this is the standard all-to-all, uniform-coupling form. Un-0 replaces this $K/N$ with a learned coupling matrix $K_{ij}$ per oscillator pair (the “coupling matrix K” below) — a more general form. If the math notation worries you, there’s also an intro-to-math-for-reading-AI series.

How Un-0 turns this synchrony into images

Un-0 uses this population of oscillators as a computing device for image generation. The difference from a conventional neural net is in the computation itself.

	What it computes	How it proceeds
Conventional neural net	apply a nonlinearity to `Σ(input × weight)`, stack layers	pass through layers (feedforward)
Un-0	keep updating `phase += Σ(coupling K × sin(phase diff))`	evolve over time (a dynamical system)

The weights (the coupling K) don’t disappear. What changes is how the computation proceeds: instead of passing through layers, it uses the phase gaps as cues and aligns them over time.

Roughly: take thousands of oscillators that start spinning at random, evolve them over time under the learned coupling, snapshot the state at a given moment, and turn that into an image.

flowchart TD
  A[Random initial phases] --> B[Class-conditioning oscillators<br/>pull toward the target class]
  B --> C[Evolve under learned dynamics<br/>integrate the ODE to time T]
  C --> D[Phase snapshot at time T<br/>= latent representation]
  D --> E[Decode to pixels with a conv net]
  E --> F[Generated image]

There is no step-by-step denoising schedule in this flow. From random initial phases, it just evolves physically over time and takes the state at time T once.

There are basically only three things to learn.

Learned component	Role
Coupling matrix K	how strongly oscillator i pulls on oscillator j
Natural frequencies ω	the speed each oscillator wants to turn at
Decoder	a conventional conv net that turns phases into pixels

At generation time, all oscillators start from random phases, and a separate set of “conditioning oscillators” pulls the system toward the requested class (dog, car, etc.) through the learned coupling. From there it integrates the learned dynamics to time T, takes the phases at that point as the latent, and the decoder renders pixels. The phase $\theta$ is converted into a 2D vector via cos(θ - θ_ref), sin(θ - θ_ref) — the difference from a reference phase θ_ref — before being handed to the decoder (how θ_ref is chosen differs between CIFAR and ImageNet).

The division of labor is interesting. Through ablations (swapping parts in and out to measure their effect), the authors suggest a split where the oscillators handle generative diversity (coverage) and the decoder handles image quality (fidelity). The decoder alone (no dynamics) doesn’t produce decent images, and learned coupling is clearly stronger than a reservoir with randomly fixed coupling. The fact that performance improves as you add integration steps also backs up that the nonlinear dynamics really do part of the computation.

How it differs from diffusion models

Diffusion models repeat the operation of gradually removing noise from a noisy image over many steps, with a network guiding “which way to move next” at each step. Un-0 has neither this step-by-step denoising schedule nor that guidance. From random initial phases, it just evolves the oscillator system physically for a fixed time T and snapshots the state at that time T once. The coupled-oscillator system can be read as a structure heading toward per-class attractors (the points it settles toward).

Training uses a “drifting loss” (Deng et al., 2026), with a pretrained DINOv2 as the feature extractor to compare the generated and real distributions. The authors note that the biggest training bottleneck is computing this drifting loss (which needs a conventional image feature extractor).

Performance and limits

Here are the ImageNet 64×64 results.

Model	Oscillators	Params	FID@50k
Un-0.n6656	6,656	57.17M	8.41
Un-0.n10240	10,240	129.80M	8.01
Un-0.n16384	16,384	322.44M	6.74

The largest model has 16,384 oscillators and 322M parameters. The decoder is consistently about 11% of all parameters, with the rest on the oscillator side. On CIFAR-10, 4,096 oscillators / 19.4M reach an FID of about 8.8. Training runs on B200 GPUs; the largest ImageNet-64 model took 640 B200-hours.

FID 6.74 is “roughly where today’s leading generative methods started,” short of SOTA like EDM or GDD. You’ll also see it called “Stable Diffusion-class,” but that framing is press-driven, and the FID on ImageNet 64×64 isn’t directly comparable to Stable Diffusion’s practical image quality. Take the number coolly.

What’s worth noting is that the “physical computation” banner still leans on conventional neural nets.

The loss computation in training needs a pretrained DINOv2
Rendering to pixels needs a conventional conv decoder
And above all, the whole thing is a simulation on B200 GPUs; the physical chip hasn’t been built

The “1000x less energy than GPUs” is a theoretical estimate: implement the coupled oscillators in analog CMOS (ring oscillators) and let physics do the computing, and the overhead of digital multiply-accumulate goes away. Even The Next Web’s article notes that the gap between the software simulation and a working chip is vast, and that whether the 1000x can be delivered is a question only hardware can answer. The company says it will release chip schematics soon, but gives no timeline for commercial hardware.

Who’s building it

Unconventional AI’s CEO is Naveen Rao. A serial founder and former head of AI at Databricks, he previously built Nervana Systems (acquired by Intel for ~$400M in 2016) and MosaicML (acquired by Databricks for ~$1.3B in 2023). Unconventional AI raised a $475M seed in December 2025 at a $4.5B valuation, led by Lightspeed and a16z, with Sequoia, Lux, and DCVC, plus Jeff Bezos. It has fewer than 50 employees.

For a company whose mission is to build a computer that computes with the laws of physics, Un-0 is positioned as its first demo.