Elon Musk Shocks with X's Open-Sourced Recommendation Algorithm: 1,600 GitHub Stars in Six Hours, Boasts No Competitor Match

Musk's "first step" to open source the X platform

"We will open-source the latest content recommendation algorithm of the X platform within 7 days."

A few days ago, this statement made by Elon Musk, the helmsman of the X platform, really stunned many people. After all, this means that the outside world will have the first opportunity to systematically understand how X decides which natural content and which advertising content to recommend to users.

As soon as the words were spoken, many people eagerly awaited and "camped out" on X every day, waiting to see if Musk would "break his promise."

What's exciting is that the official Engineering account of the X platform's engineering team gave an answer today. They made a major announcement that the new X recommendation algorithm is officially open-sourced - this algorithm uses the same Transformer architecture as xAI's Grok model.

Meanwhile, the relevant GitHub repository is also made public: https://github.com/xai-org/x-algorithm. It has already received 1.6k Stars within six hours of going live.

This is not just a symbolic "showing some code." For the question that the outside world has been discussing for years - "How does X's recommendation system really work?" This time, there is finally an object to directly read the source code.

From "Making a Statement" to "Delivering Results": Why Does Musk Insist on Open-Sourcing the Algorithm?

To talk about this open-source move, we first need to understand Musk's "obsession."

In the field of social platforms, recommendation algorithms have always been the "core secrets" of each company: every post and every advertisement that users see is the result of calculations by the algorithm based on various dimensions such as user behavior, content tags, and business requirements. Previously, whether it was Facebook, Instagram, or other social platforms, the algorithms were firmly locked in a "black box." The outside world could only guess the logic through reverse engineering, and platforms usually wouldn't take the initiative to disclose the details.

But Musk is different. Before and after taking over the X platform, he has complained about the problem of "opaque algorithms" more than once. So, he is determined to build a "free square."

This time, setting the "open-source goal" is not so much a whim as a crucial step in his transformation of the X platform: on the one hand, by open-sourcing, global developers and users can supervise the algorithm logic, reducing doubts about "algorithm bias" and "traffic manipulation" and also meeting regulatory requirements; on the other hand, leveraging the power of the community to optimize the algorithm - after all, the wisdom of global programmers is far more efficient than the "closed-door development" of an internal team, and this can strengthen the moat of the X ecosystem.

Of course, this is just the first step. Musk also said previously that "the code will be updated every four weeks in the future, along with developer notes indicating the changes in the algorithm and logic."

This model of "continuous open-sourcing + transparent updates" is almost an unprecedented attempt in the field of social platforms. So, what exactly did they open-source this time? Let's find out next.

Unboxing the GitHub Repository: What Does X's Recommendation Algorithm Look Like?

Opening the repository at https://github.com/xai-org/x-algorithm, we first see that what the X platform has open-sourced this time is the core recommendation system of the "For You" feed.

According to the X engineering team, the content of the "For You" feed mainly comes from two sources:

One is in-platform content (Thunder module), which are the posts published by the accounts that users follow;

The other is off-platform content (Phoenix retrieval module), which are the posts mined from the global content library that users may be interested in but haven't followed. This is also the core source of the system's "recommendation of unfamiliar content."

After integrating the two types of content, it will be analyzed by the Phoenix model (based on the Grok Transformer model, the Transformer implementation is ported from xAI's open-sourced Grok-1 and adapted and adjusted for the specific usage scenarios of the recommendation system) - this model will predict the probability of various interactions of users with each post through the user's interaction history such as likes, replies, and reposts. The final content score is a weighted combination of these probabilities.

The X engineering team also revealed that the system has removed all manually designed features and most heuristic rules. The core computing work is completely undertaken by this Grok Transformer model. Its core logic is to determine the relevance between content and users by analyzing the user's interaction history, rather than relying on manually set content relevance features. This design greatly reduces the complexity of the data processing pipeline and the push infrastructure.

The system architecture is as follows:

Mainly Using Rust, Supplemented by Python: Unveiling the "For You" Recommendation System

From the perspective of the technology stack, this repository mainly uses two programming languages, Rust and Python. The project follows the Apache License 2.0 open-source license.

The code files in this repository are divided by functional modules, and the core modules have clear divisions of labor:

phoenix/: It contains the core code such as the Grok model adaptation, the recommendation system model (recsys_model.py), the retrieval model (recsys_retrieval_model.py), as well as the scripts for model operation and testing;

home-mixer/: Developed in Rust, it is the "orchestration layer" of the recommendation system, containing the core logic such as candidate content completion, query data completion, scorers, and filters;

thunder/: Based on Rust development, it is responsible for the retrieval, deserialization of "in-platform content" (posts of followed accounts), and Kafka message processing;

candidate-pipeline/: It is related to the logic of the candidate content pipeline and is a key link connecting the content source and subsequent processing.

The working principle and core execution process of this recommendation system have a clear logical link - starting from responding to the user's feed request, the algorithm will complete content screening and pushing through seven core stages. Each step focuses on "precisely matching the user's interests" while avoiding duplicate, low-quality, or content that the user dislikes.

Step 1: Retrieve Core User Data

The first step in starting the algorithm is "user data completion": the system will first capture the user's recent interaction records, such as likes, replies, reposts, and clicks. At the same time, it will retrieve basic metadata such as the follow list and account preference settings. These information are the core basis for subsequent recommendations, equivalent to building the basic framework of the "user profile" for the algorithm.

Step 2: Capture Two Types of Candidate Content

Based on the user data, the algorithm will retrieve candidate content from the two channels mentioned above:

One type is "in-platform content," which is handled by the Thunder module and comes from the recent posts of the accounts that the user follows. This is also one of the core contents that users usually see.

The other type is "off-platform content," which is mined by the Phoenix retrieval module from the global content library through machine learning technology. These are posts that users haven't followed but may be interested in. This is also the core source of the system's "recommendation of unfamiliar content."

Step 3: Complete the Full Information of the Content

To make the subsequent scoring more accurate, the algorithm will perform "information completion" on all candidate content. For example, it will supplement the core materials such as the text, pictures/videos of the post, the username and verification status of the author, the duration of video posts, and the subscription permissions corresponding to the content, ensuring that the information dimension of each candidate content is complete.

Step 4: Filter Out Invalid Content Before Scoring

Before entering the core scoring stage, the algorithm will perform "pre-filtering" and directly remove the content that does not meet the requirements, including duplicate posts, expired content, posts published by the user himself, and content from blocked/muted accounts or containing keywords that the user has muted.

In addition, content that the user has already viewed, content that has been recently pushed, or paid content that the user has no permission to access will also be filtered out at this step to avoid low-quality content occupying subsequent computing resources.

Step 5: Score and Sort from Multiple Dimensions

The filtered content will enter the core scoring stage (Scoring). The system will sequentially call four scorers to calculate the "fit score":

First is the Phoenix scorer, which obtains the machine learning prediction results from the Grok-based Transformer model;

Then the weighted scorer integrates these prediction results into the final relevance score;

The author diversity scorer will deliberately reduce the scores of content from repeated authors to ensure the diversity of content in the feed;

The off-platform content scorer (OON Scorer) specifically adjusts the scores of content mined from the global content library to balance the display ratio of in-platform and off-platform content.

Step 6: Screening

Finally, the algorithm will sort all the content according to the scores from high to low and select the top K candidate content to enter the final stage.

Step 7: Final Verification and Pushing

Even if the content has a high score, it still needs to go through the final verification of "post-screening processing": the system will perform a final round of compliance and validity checks on the candidate content. Only after confirming that there are no problems will it be officially pushed to the user's feed. This is also the last checkpoint before the content is presented to the user.

In the underlying design of the algorithm, the X platform's recommendation system has five core decisions, which also make it different from traditional recommendation algorithms:

The system completely abandons manually designed features and mainly relies on the Grok-based Transformer model to autonomously learn the relevance between content and users from the user's interaction behavior sequence. It no longer relies on manually set content relevance features. This design greatly reduces the complexity of the data processing pipeline and the push infrastructure;

In the sorting stage, the model uses the method of "isolated calculation" for candidate content. During the inference process, candidate content will not affect each other. It only scores based on the user's context, ensuring that the score of a single post is not affected by other content in the same batch, making the scoring results more stable and cacheable;

Both the retrieval and sorting core stages use multiple hash functions to implement the lookup of embedded vectors, improving the operating efficiency of the algorithm;

Different from traditional models that only predict a single "relevance" score, this model will simultaneously predict the probabilities of various user behaviors towards the content, making the scoring dimensions more comprehensive;

In addition, the system builds a combinable pipeline architecture based on the candidate-pipeline framework. It not only separates the pipeline execution and monitoring logic from the business logic, supports parallel execution of independent links and graceful error handling, but also can conveniently add new content sources, data completion rules, filters, and scorers, making the algorithm highly flexible and scalable.

Musk: "No Other Social Media Company Has Done This"

When open-sourcing this time, Musk said bluntly: "We know this algorithm is clumsy and needs significant improvement, but at least you can see the process of our efforts to improve it in real-time and transparently."

He also emphasized that "No other social media company has done this."

Musk's move of open-sourcing the X platform's recommendation algorithm has sparked a lot of discussions.

For ordinary developers, the value of this open-sourcing goes far beyond just "looking at the code." The benefits are obvious, as netizen AbundanceVsWar commented:

“It's important because when the system that allocates attention is opaque, abundance is impossible.

When people don't understand how influence is distributed, attention seems zero-sum, manipulated, and full of politics. Just this perception itself can lead to conflicts. Open-sourcing the recommendation algorithm turns attention from a mysterious resource into an understandable system. And understandability changes people's behavior.

Indeed, at first, transparency may make the "game" easier to exploit. But this is not a flaw; it's just a stage. A closed system freezes power, while an open system exposes vulnerabilities, adapts to changes, and continuously improves. Over time, the balance of the system will shift from anger and tribalism to optimization and contribution.

This is how to reduce artificial scarcity. The method is not to de-moralize content but to make the rules visible so that value can expand instead of making attention a contested object.”

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Elon Musk dropped a bombshell as X's recommendation algorithm was officially open-sourced. It garnered 1,600 stars on GitHub within just six hours. Musk boasted that no competitor has ever done this.

From "Making a Statement" to "Delivering Results": Why Does Musk Insist on Open-Sourcing the Algorithm?

Unboxing the GitHub Repository: What Does X's Recommendation Algorithm Look Like?

Mainly Using Rust, Supplemented by Python: Unveiling the "For You" Recommendation System

Musk: "No Other Social Media Company Has Done This"