#Transformer

3 articles

Tech Mar 22, 2026 6 min

Luma AI's Uni-1 unifies understanding and generation in a single Transformer

Luma AI's Uni-1 integrates image understanding and generation in one decoder-only autoregressive model. It does not use diffusion; instead, it tokenizes text and image patches in a shared vocabulary and generates them sequentially.

AI Image Generation Luma AI Transformer

Tech Mar 21, 2026 10 min

MoonshotAI (Kimi) proposed AttnRes to replace Transformer's residual connection with Attention, 1.25 times more computationally efficient.

AttnRes to replace Transformer's fixed residual combination with softmax attention in the depth direction. Demonstration with Kimi Linear 48B improved GPQA-Diamond +7.5pt and HumanEval +3.1pt. Training overhead was kept below 4% and inference below 2%.

A.I. LLM MoonshotAI Kimi Transformer the study

Tech Feb 4, 2026 2 min

A Unified View of Attention Sinks and Residual Sinks: LLM 'Outliers' as a Training-Stability Mechanism

A paper explains that two seemingly mysterious Transformer behaviors, heavy attention on specific tokens and unusually large activations in specific dimensions, are actually manifestations of the same mechanism.

LLM Transformer Research