x-algorithm May 2026: Phoenix pipeline runnable locally with 3GB artifacts
Contents
xAI’s x-algorithm was updated on May 15, 2026.
This is the public repository for the recommendation system that drives X’s For You feed, and only the second commit since the initial release on January 20.
The commit is labeled “May 15th, 2026” in the README itself.
Separate from Grok as a chat model or image generation API, X’s feed uses Grok-family Transformers as recommendation models.
This blog has previously covered differences in LLM safety filters including Grok and Grok Imagine’s API pricing, but this time the subject is the code that decides what users see.
The January release was closer to a structural overview
The initial x-algorithm release was enough to read the overall For You feed architecture.
The composition was visible: in-network Thunder, out-of-network Phoenix Retrieval, Home Mixer, and Candidate Pipeline.
However, there was too much missing to actually run anything locally.
Phoenix was described as a two-stage retrieval-and-ranking model, but it wasn’t in a form where you could feed an input and get candidates out end-to-end with model artifacts.
Looking at the README diff, this update goes beyond adding explanations.
grox/, home-mixer/ads/, numerous hydrators/sources, phoenix/run_pipeline.py, and Git LFS-tracked Phoenix artifacts were added.
graph TD
A[For You request] --> B[Home Mixer]
B --> C[Query hydrators]
C --> D[Thunder<br/>Posts from followed accounts]
C --> E[Phoenix Retrieval<br/>Global candidate search]
D --> F[Candidate hydrators]
E --> F
F --> G[Filters]
G --> H[Phoenix ranker]
H --> I[Weighted scorer]
I --> J[Ads blender]
J --> K[Visibility filters]
K --> L[Feed response]
Most of this diagram was conceptually readable from the January version.
What changed in May is that Phoenix’s small execution path and Home Mixer’s surrounding processing got substantially thicker as actual code.
Phoenix runs retrieval through ranking on a sports corpus
phoenix/README.md describes the public Phoenix’s nature fairly clearly.
It’s not the production model itself but a mini version trained on the same real-time engagement data.
Production uses a larger model with continuous training; the public version is treated as a frozen checkpoint from a point in time.
The artifacts include sports_corpus.npz.
This is a demo corpus of approximately 537,000 post IDs collected over 6 hours, filtered to the Sports topic.
example_sequence.json contains example user behavior history for NFL, NBA, and NHL posts.
The execution path works as follows.
User history is encoded, top candidates are retrieved via dot product against precomputed candidate representations, and a ranking model outputs probabilities for favorite, reply, repost, dwell, video view, and other actions.
By default, retrieval pulls the top 200 and the display output is top 30.
cd phoenix
unzip artifacts/oss-phoenix-artifacts.zip -d artifacts/
uv run run_pipeline.py --artifacts_dir artifacts/oss-phoenix-artifacts
What became runnable is not “full X feed reproduction” but “a scaled-down demo of Phoenix retrieval and ranking.”
The public artifacts are listed as approximately 3GB in the README, mostly embedding tables.
User, post, and author vocabularies are 1 million each, using hash embedding to map the ID space into embeddings.
Ranking doesn’t let candidates see each other
The notable design in Phoenix’s ranking is candidate isolation.
Candidate posts can attend to user information and history, but cannot attend to other candidate posts.
The design ensures that candidate A’s score doesn’t change based on whether candidate B happens to be in the same batch.
This is a fairly practical issue in recommendation systems.
If you let candidates see each other, the ranking model can use the full batch context.
The tradeoff is that the same post gets different scores depending on which candidates co-occur, making caching and debugging harder.
The public README explains this as “score for a post doesn’t depend on which other posts are in the batch.”
In X’s feed, candidate generation, filtering, ad blending, and visibility filters are layered in multiple stages.
Keeping ranking scores tied to individual candidates makes them easier to handle when the candidate set changes in later stages.
Grox handles understanding tasks before and after the feed
The newly added grox/ is not the Phoenix ranking model itself.
It contains tasks for spam detection, post category classification, PTOS policy enforcement, reply ranking, and multimodal post embedding.
ASR processing, Kafka loader, Strato loader, and task dispatcher are also included, appearing to be the code that understands posts and produces annotations and embeddings.
If you look at the For You feed as “just the ranking model,” this part is easy to miss.
Actual candidates come with text, images, video, replies, quotes, language, brand safety, and policy classification attached.
Before Phoenix outputs probabilities, there’s substantial processing to determine in what state candidates are passed to ranking.
The May version also adds hydrators on the Home Mixer side.
Engagement counts, language code, media detection, quote post expansion, mutual follow score, and brand safety signal were added.
Even though the README states “removed hand-crafted features,” the code that collects post and user context still exists.
Ad blending is now in the public code
home-mixer/ads/ being added is another update in this release.
partition_organic_blender that mixes organic and ad candidates, safe_gap_blender for ad spacing, ad injection log side effects, and a brand safety hydrator were added.
This is a part that creator-oriented “algorithm hacking” articles tend to gloss over.
Even after the recommendation model outputs engagement probabilities for posts, the final feed includes ads, visibility filters, deduplication, read history, muted keywords, blocked users, and served history.
Reading just the model’s preferences doesn’t match the actual order that appears on screen.
In the public code, post-selection visibility filters come last.
Judgments for deleted, spam, violence, gore, and similar categories are applied after ranking too.
The pipeline is not simple enough to explain with just “make posts that Grok would like.”
What’s published and what’s still missing
Running the Phoenix scaled-down pipeline locally is the major difference from January, but the public version alone can’t reproduce the production For You feed.
Production Phoenix is larger than the public mini version and continuously trained.
The public corpus covers 6 hours of Sports topic only, not X’s entire post collection.
Weights, experiment configs, feature flags, live policy classification, and ad delivery constraints are also separated from production.
This public release makes a small cross-section of Phoenix and a fairly specific cross-section around Home Mixer accessible to inspection.