Gerade hat Musk den auf Grok basierenden Empfehlungsalgorithmus von X open source gemacht: Transformer übernimmt die Sortierung von Hunderten von Millionen von Elementen.
After nearly three years, Elon Musk opens the source code of X's recommendation system again
Right now, X's engineering team announced in a post on X that it is officially making X's recommendation system open source. According to the statement, this open - source library contains the core recommendation system that supports the "For You" newsfeed on X. It combines content from the network (from accounts you follow) with off - network content (discovered through machine - learning - based search) and uses a Grok - based transformer model to evaluate all content. That is, the algorithm uses the same transformer architecture as Grok.
Open - source address: https://x.com/XEng/status/2013471689087086804
X's recommendation system is responsible for generating the "For You" content that users see on the home page. It draws candidate posts from two main sources:
- Accounts you follow (In - Network / Thunder)
- Other posts discovered on the platform (Out - of - Network / Phoenix)
These candidate contents are then uniformly processed, filtered, and then sorted by relevance.
What does the core architecture and the working principle of the algorithm look like now?
The algorithm first collects candidate content from two sources:
- Content from your followers: Posts published by accounts that you actively follow.
- Non - follower content: Posts found by the system in the entire content archive that may arouse your interest.
The goal of this phase is to "find potentially relevant posts".
The system automatically removes low - value, duplicate, offensive, or inappropriate content. For example:
- Content from blocked accounts
- Topics that the user is explicitly not interested in
- Illegal, outdated, or invalid posts
This ensures that only valuable candidate content is processed in the final sorting.
The core of the now open - sourced algorithm is that the system uses a Grok - based transformer model (similar to a large language model / deep - learning network) to evaluate each candidate post. The transformer model predicts the probability of each action based on the user's historical behavior (likes, comments, forwards, clicks, etc.). Finally, these action probabilities are weighted and combined into an overall score. The higher the score of a post, the more likely it is to be recommended to the user.
This design largely makes the traditional method of manual feature extraction redundant and instead uses an end - to - end learning method to predict user interest.
This is not the first time Elon Musk has made X's recommendation system open source.
As he promised when he bought Twitter, on March 31, 2023, he officially made a part of Twitter's source code, including the algorithm for recommending tweets in the user timeline, open source. On the day of publication, the project on GitHub had already received over 10,000 stars.
At that time, Musk stated on Twitter that the publication was "most of the recommendation system" and that the rest of the algorithm would also be gradually released. He also mentioned that he hoped that "independent third parties would be able to determine with reasonable accuracy which content Twitter might show to users".
In a Space discussion about the publication of the algorithm, he said that the goal of the open - source project was to make Twitter the "most transparent system on the Internet" and to make it as robust as the most well - known and successful open - source project, Linux. "The overall goal is that users who continue to support Twitter can enjoy as much as possible here."
Nearly three years have passed since Musk first made X's recommendation system open source. As a super KOL in the technology industry, Musk has already extensively promoted the current open - source release.
On January 11, Musk wrote in a post on X that he would make the new X algorithm (including all codes for determining which organic search content and advertising content to recommend to users) open source within seven days.
This process will be repeated every four weeks and will be accompanied by detailed developer instructions to help users understand the changes.
Today, he has fulfilled his promise again.
Why does Musk make the recommendation system open source?
When Elon Musk uses the word "open source" again, the outside world's first reaction is not technical idealism but real pressure.
In the past 12 months, X has come under criticism several times due to its content distribution mechanism. The platform is often criticized for its algorithm favoring and promoting right - wing views. This tendency is not sporadic but is considered systemic. A research report last year showed that X's recommendation system has significant new biases in spreading political content.
At the same time, some extreme cases have further strengthened the outside world's doubts. Last year, an unvetted video showing the murder of the right - wing American activist Charlie Kirk quickly spread on the X platform and shocked the public. Critics not only see this as evidence of the platform's ineffective review but also as a further indication of the hidden power of the algorithm in determining what is "amplified" and what is not.
Against this background, it is difficult to interpret Musk's sudden emphasis on algorithm transparency as a purely technical decision.
What do users think?
After the open - source release of X's recommendation system, some users on the X platform summarized the following five points:
Reply to comments. The algorithm gives "reply + author input" 75 times more weight than likes. Not replying to comments significantly affects visibility.
Links reduce visibility. Put links in your profile description or in a pinned post, never in the main text of the post.
Display duration is crucial. If users scroll quickly, you won't be able to attract them. Videos / posts get high attention because they make users stop.
Stay in your area. "Simulated clusters" actually exist. If you deviate from your niche area (cryptocurrencies, technology, etc.), you won't get distribution channels.
Blocking / muting significantly reduces your score. Be controversial but not annoying.
In short: Communicate with your audience, build relationships, and keep users in the app. It's actually quite simple.
Some users have also noticed that although the architecture is open source, some content has not been released. The user stated that the release is essentially a framework without an engine. What is exactly missing?
- Missing weighting parameters - The code confirms "bonus points for positive actions" and "deductions for negative actions", but in contrast to the 2023 version, the exact numbers have been removed.
- Hidden model weights - The internal parameters and calculations of the model itself are not included.
- Unpublished training data - We know nothing about the training data for the model, how user behavior is collected, and how "good" and "bad" examples are created.
For normal X users, the open - source release of the algorithm will not have a big impact. But higher transparency can explain why some posts are visible while others are ignored, and it allows researchers to study how the platform evaluates content.
Why is the recommendation system an important battleground?
In most technical discussions, the recommendation system is often regarded as part of the background development, discrete, complex, but rarely in the spotlight. But if you actually break down the business processes of Internet giants, you will find that the recommendation system is not a marginal module but the "infrastructure" for the entire business model. Therefore, it can be called the "silent giant" of the Internet industry.
Public data has repeatedly confirmed this. Amazon has stated that about 35% of all purchases on its platform come directly from the recommendation system. Netflix is even more radical: About 80% of the viewing time is driven by recommendation algorithms. The situation is similar for YouTube: About 70% of the views come from the recommendation system, especially the newsfeed (feed). Even though Meta has never mentioned an exact percentage, its technology team has said that about 80% of the computing power in the internal computing clusters is used for recommendation tasks.
What do these numbers mean? If you remove the recommendation system from these products, it's almost like removing the foundation. Take Meta as an example: Ad placement, user stay time, and commercial implementation are almost all based on the recommendation system. The recommendation system not only determines what users "see" but also directly how the platform "makes money".
Nevertheless, this vital system has long had the problem of high technical complexity.
In the traditional architecture of the recommendation system, it is difficult to use a single model for all scenarios. In practice, the production systems are often highly fragmented. At companies like Meta, LinkedIn, and Netflix, usually 30 or more special models run simultaneously behind a complete recommendation process: recall models, coarse sorting models, fine sorting models, re - ordering models, each optimized for different target functions and business metrics. Behind each model usually stands one or more teams responsible for feature development, training, parameter adjustment, going live, and continuous improvement.
The costs of this model are obvious: high technical complexity, high maintenance costs, and difficult cooperation between different tasks. If someone asks the question "Can you use a single model to solve multiple recommendation problems?", it means a drastic reduction in complexity for the entire system. This is the goal that the industry has long pursued but has so far had difficulty achieving.
The emergence of large language models has opened up a new possible path for the recommendation system.
In practice, LLMs have proven that they can be extremely powerful universal models: they can be well transferred between different tasks, and their performance can be continuously improved with the increase in data volume and computing power. In contrast, traditional recommendation models are often "task - specific" and have difficulty sharing their capabilities between different scenarios.
More importantly, a single large model not only reduces technical complexity but also offers the potential for "cross - learning". When the same model processes multiple recommendation tasks simultaneously, the signals between different tasks can complement each other. With the increase in data volume, the model can be more easily improved overall. This is the property that the recommendation system has long pursued but has so far had difficulty achieving through traditional methods.
What has the LLM changed? In fact, it has changed feature development and understanding.
From a methodological perspective, the LLM has most changed the recommendation system in "feature development", a core step.
In traditional recommendation systems, engineers first have to manually create a large number of signals: user click history, dwell time, similar user preferences, content tags, etc. Then,