Just now, Elon Musk open-sourced the Grok-based X recommendation algorithm: Transformer takes over the sorting of hundreds of millions of items.
Nearly three years later, Musk open-sources X's recommendation algorithm again
Just now, the X engineering team posted on X, announcing the official open-sourcing of X's recommendation algorithm. According to the introduction, this open-source library contains the core recommendation system that powers the "For You" feed on X. It combines in-network content (from accounts you follow) with out-of-network content (discovered through machine learning-based retrieval) and ranks all content using a Grok-based Transformer model. That is to say, this algorithm adopts the same Transformer architecture as Grok.
Open-source address: https://x.com/XEng/status/2013471689087086804
X's recommendation algorithm is responsible for generating the "For You Feed" content that users see on the main interface. It gets candidate posts from two main sources:
- Accounts you follow (In-Network / Thunder)
- Other posts discovered on the platform (Out-of-Network / Phoenix)
These candidate contents are then processed uniformly, filtered, and sorted by relevance.
So, what is the core architecture and operating logic of the algorithm?
The algorithm first fetches candidate content from two types of sources:
- Content from followed accounts: Posts published by accounts you actively follow.
- Non-followed content: Posts that the system retrieves from the entire content library and that you might be interested in.
The goal of this stage is to "find out potentially relevant posts."
The system automatically removes low-quality, duplicate, violating, or inappropriate content. For example:
- Content from blocked accounts
- Topics that the user has clearly shown no interest in
- Illegal, outdated, or invalid posts
This ensures that only valuable candidate content is processed during the final sorting.
The core of the open-sourced algorithm is that the system uses a Grok-based Transformer model (similar to a large language model / deep learning network) to score each candidate post. The Transformer model predicts the probability of each behavior based on the user's historical behavior (likes, replies, retweets, clicks, etc.). Finally, these behavior probabilities are weighted and combined into a comprehensive score. The higher the score of a post, the more likely it is to be recommended to the user.
This design basically abolishes the traditional practice of manually extracting features and instead uses an end-to-end learning method to predict user interests.
This is not Musk's first time to open-source X's recommendation algorithm.
As early as March 31, 2023, just as Musk promised when he acquired Twitter, he officially open-sourced part of Twitter's source code, including the algorithm for recommending tweets in the user's timeline. On the day of the open-source, the project had received more than 10k stars on GitHub.
At that time, Musk said on Twitter that what was released this time was "most of the recommendation algorithm," and the rest of the algorithms would be gradually opened. He also mentioned that he hoped that "independent third parties could determine with reasonable accuracy the content that Twitter might show to users."
In the Space discussion about the algorithm release, he said that this open-source plan aimed to make Twitter "the most transparent system on the Internet" and make it as robust as the most well-known and successful open-source project, Linux. "The overall goal is to let the users who continue to support Twitter enjoy it to the fullest."
Nearly three years have passed since Musk first open-sourced X's algorithm. As a super KOL in the tech circle, Musk has done enough publicity for this open-sourcing.
On January 11, Musk posted on X that he would open-source the new X algorithm (including all the code for determining which organic search content and advertising content to recommend to users) within 7 days.
This process will be repeated every 4 weeks, accompanied by detailed developer instructions to help users understand what changes have occurred.
Today, his promise has been fulfilled again.
Why does Musk want to open-source it?
When Elon Musk mentioned "open source" again, the outside world's first reaction was not technological idealism but real pressure.
In the past year, X has repeatedly been in controversy because of its content distribution mechanism. The platform has been widely criticized for favoring and promoting right-wing views at the algorithm level. This tendency is not an isolated case but is considered to have systematic characteristics. A research report released last year pointed out that there was an obvious new bias in X's recommendation system in the dissemination of political content.
Meanwhile, some extreme cases have further amplified the outside world's doubts. Last year, an un-reviewed video involving an assassination attempt on Charlie Kirk, a right-wing activist in the United States, spread rapidly on the X platform, causing an uproar in public opinion. Critics believe that this not only exposed the failure of the platform's review mechanism but also highlighted again the implicit power of the algorithm in "what to amplify and what not to amplify."
In this context, it is hard to simply interpret Musk's sudden emphasis on algorithm transparency as a pure technological decision.
What do netizens think?
After the open-sourcing of X's recommendation algorithm, on the X platform, some users summarized the recommendation algorithm mechanism in the following 5 points:
Reply to your comments. The algorithm weights "replies + author responses" 75 times more than likes. Not replying to comments will seriously affect the exposure rate.
Links will reduce the exposure rate. You should put links in your profile or pinned posts, and never in the main text of your post.
Viewing time is crucial. If they scroll past, you won't attract them. The reason why videos / posts get high attention is that they can make users stop.
Stick to your field. "Simulated clusters" are real. If you deviate from your niche (cryptocurrency, technology, etc.), you won't get any distribution channels.
Blocking / being silent will significantly reduce your score. Be controversial, but not annoying.
In short: Communicate with your audience, build relationships, and keep users in the app. It's actually very simple.
Some netizens also found that although the architecture is open-sourced, some content is still not open-sourced. The netizen said that this release is essentially a framework without an engine. Specifically, what's missing?
- Lack of weight parameters - The code confirms "bonus points for positive behavior" and "deduction of points for negative behavior," but unlike the 2023 version, the specific values have been removed.
- Hidden model weights - It does not include the internal parameters and calculations of the model itself.
- Unpublished training data - We know nothing about the data used to train the model, the sampling method of user behavior, and how to construct "good" and "bad" samples.
For ordinary X users, the open-sourcing of X's algorithm will not have much impact. However, higher transparency can explain why some posts get exposure while others go unnoticed and enable researchers to study how the platform ranks content.
Why is the recommendation system a battleground?
In most technical discussions, the recommendation system is often regarded as part of the back-end engineering. It is low-key, complex, and rarely in the spotlight. But if you really break down the business operation mode of Internet giants, you will find that the recommendation system is not an edge module but an "infrastructure-level existence" that supports the entire business model. For this reason, it can be called the "silent giant" in the Internet industry.
Public data has repeatedly confirmed this. Amazon once disclosed that about 35% of the purchase behavior on its platform directly comes from the recommendation system; Netflix is even more radical, with about 80% of the viewing time driven by the recommendation algorithm; YouTube is similar, with about 70% of the viewing coming from the recommendation system, especially the feed. As for Meta, although it has never given a clear ratio, its technical team once mentioned that about 80% of the computing power cycles in the company's internal computing cluster are used to serve recommendation-related tasks.
What do these numbers mean? If the recommendation system is removed from these products, it is almost equivalent to removing the foundation. Take Meta for example. Advertising placement, user retention time, and business conversion are almost all based on the recommendation system. The recommendation system not only determines what users "see" but also directly determines how the platform "makes money."
However, such a life-and-death system has long faced the problem of extremely high engineering complexity.
In the traditional recommendation system architecture, it is difficult to cover all scenarios with a unified model. Real-world production systems are often highly fragmented. Take companies like Meta, LinkedIn, and Netflix as examples. Behind a complete recommendation link, usually 30 or more dedicated models are running simultaneously: recall models, rough ranking models, fine ranking models, and re-ranking models, each optimized for different objective functions and business indicators. Behind each model, there is often one or more teams responsible for feature engineering, training, parameter tuning, going live, and continuous iteration.
The cost of this model is obvious: complex engineering, high maintenance costs, and difficult cross-task collaboration. Once someone proposes "whether a single model can solve multiple recommendation problems," it means a significant reduction in complexity for the entire system. This is exactly the long-awaited but difficult-to-achieve goal in the industry.
The emergence of large language models provides a new possible path for the recommendation system.
LLMs have proven in practice that they can be extremely powerful general models: they have strong transferability between different tasks, and their performance can continue to improve as the data scale and computing power expand. In contrast, traditional recommendation models are often "task-specific" and difficult to share capabilities across multiple scenarios.
More importantly, a single large model not only simplifies the engineering but also has the potential for "cross-learning." When the same model processes multiple recommendation tasks simultaneously, the signals between different tasks can complement each other. As the data scale grows, the model can evolve more easily as a whole. This is exactly the characteristic that the recommendation system has long desired but is difficult to achieve through traditional methods.
What has the LLM changed? In fact, it has changed everything from feature engineering to understanding ability.
From a methodological perspective, the biggest change that the LLM has brought to the recommendation system occurs in "feature engineering." In the traditional recommendation system, it is difficult to use a unified model to cover all scenarios. In reality, production systems are often highly fragmented. For example, in companies like Meta, LinkedIn, and Netflix, a complete recommendation link usually involves the simultaneous operation of 30 or more dedicated models: recall models, rough ranking models, fine ranking models, and re-ranking models, each optimized for different objective functions and business metrics. Behind each model, there is often one or more teams responsible for feature engineering, training, parameter tuning, deployment, and continuous iteration.
The cost of this model is obvious: high engineering complexity, high maintenance costs, and difficult cross - task collaboration. Once someone proposes "whether a single model can solve multiple recommendation problems," it means an order - of - magnitude reduction in complexity for the entire system. This is exactly the long - desired but difficult - to - achieve goal in the industry.
The emergence of large language models provides a new possible path for the recommendation system.
LLMs have proven in practice that they can be extremely powerful general models: they have strong transferability between different tasks, and their performance can continue to improve as the data scale and computing power expand. In contrast, traditional recommendation models are often "task - specific" and difficult to share capabilities across multiple scenarios.
More importantly, a single large model not only simplifies the engineering but also has the potential for "cross - learning." When the same model processes multiple recommendation tasks simultaneously, the signals between different tasks can complement each other. As the data scale grows, the model can evolve more easily as a whole. This is exactly the characteristic that the recommendation system has long desired but is difficult to achieve through traditional methods.
What has the LLM changed? In fact, it has changed everything from feature engineering to understanding ability.
From a methodological perspective, the biggest change that the LLM has brought to the recommendation system occurs in "feature engineering." In the traditional approach, engineers have to manually extract a large number of features such as user historical behavior (likes, comments, shares), content tags, and similar user preferences, and then explicitly tell the model to make judgments based on these features. The model itself does not understand the semantics of these signals but only learns mapping relationships in the numerical space.
After introducing the language model, this process is highly abstracted. You no longer need to specify each signal one by one. Instead, you can directly describe the problem to the model: this is a user, this is a piece of content; this user has liked similar content in the past, and other users have also given positive feedback on this content - now please judge whether this piece of content should be recommended to this user.
The language model itself has the ability to understand. It can independently determine which information is an important signal and how to comprehensively use these signals to make a decision. In a sense, it is not just executing recommendation rules but "understanding the act of recommendation."
This ability comes from the fact that the LLM has been exposed to a vast and diverse amount of data during the training phase, making it easier to capture subtle but important patterns. In contrast, traditional recommendation systems must rely on engineers to explicitly enumerate these patterns. Once a pattern is missed, the model cannot perceive it.
From a back - end perspective, this change is not unfamiliar. Just like when you ask GPT a question, it generates an answer based on the context information. Similarly, when you ask it "whether I will be interested in this piece of content," it can also make a judgment based on the existing information. To some extent, the language model itself already has the natural ability of "recommendation."
Reference links:
https://github.com/xai-org/x - algorithm
https://x.com/XEng/status/2013471689087086804
https://x.com/BlockFlow_News/status/2013510113873813