HomeArticle

Elon Musk has completely open-sourced the X platform, but I can't replicate it.

字母AI2026-05-16 19:56
Although Elon Musk is in Beijing, his goal is Europe.

Since Elon Musk open-sourced X, people have always complained, "Elon Musk, you're not being fair. You promised to open-source it but didn't do it completely. Even if we get the code, we can't build our own X platform."

But now, it's possible. The X open-source repository page has witnessed the biggest update in history, and you can really download it and build your own X platform.

Elon Musk first made the X recommendation algorithm code public on March 31, 2023. At that time, the platform was still called Twitter. The official put the twitter/the-algorithm and twitter/the-algorithm-ml repositories on GitHub, disclosing part of the recommendation logic behind the For You timeline.

But that time was more like a "code transparency demonstration." The outside world could see the basic operation mode of the recommendation system, but couldn't get the key parts such as training data, model weights, and the advertising recommendation system.

And this time, Elon Musk is serious.

It's true that X is not the largest social platform in the world. Its monthly active user count is 570 million. X's estimated revenue in 2026 is about $2.9 billion, a 43% drop from the $5.08 billion before Musk's acquisition. Before the acquisition, advertising revenue accounted for as high as 90% of X's total revenue, and after the acquisition, advertising revenue still accounts for less than 70% of the total revenue.

But it is still one of the most important social platforms in the world, a complete production system that processes 1.2 billion pieces of content and serves 500 million users every day. Top global AI companies like Anthropic and OpenAI use X as their primary information distribution platform.

Less than 24 hours after Elon Musk posted this X, the X open-source GitHub repository instantly reached 20,000 stars.

Elon Musk said in the open-source statement, "We know this algorithm is stupid and needs significant improvement, but at least you can see us working in real-time and transparently to make it better. No other social media company does this."

The recommendation algorithm is the core business secret of social media, the underlying logic that determines what users "see, believe, and buy."

Before this, no major platform was willing to put this set of logic on the table completely.

Elon Musk did.

01

What are the specific contents of the open source?

The core of the open-sourced X algorithm this time is a Grok-based transformer recommendation system.

The architecture of the whole system is not complicated, and the design idea is very clear: obtain candidate content from two sources, then use a machine learning model to sort them uniformly, and finally filter out inappropriate content and push it to users.

The two content sources are Thunder and Phoenix Retrieval respectively.

Thunder is responsible for "in-network content," that is, the posts published by the accounts you follow. It is an in-memory database that tracks the latest posts of all users in real-time, and the response speed can reach sub-millisecond level.

When you refresh the news feed, Thunder will immediately pull out the latest content posted by the people you follow.

Phoenix Retrieval is responsible for "out-of-network content," that is, the posts you don't follow but the system thinks you may be interested in.

It uses machine learning to do similarity search and finds posts related to your past interaction content from the global corpus. This is the most critical part of the recommendation system, which determines whether you will see the popular content of unfamiliar accounts in the news feed.

After the candidate content from the two sources is aggregated, it will enter the unified sorting stage. The core of this stage is Phoenix Scorer, a Grok-based Transformer model.

This model doesn't predict "relevance," but predicts the specific actions you may take for each piece of content, such as the probability of liking, retweeting, replying, clicking, reporting, and blocking.

Each action has a weight. Positive actions (liking, retweeting) have positive weights, and negative actions (reporting, blocking) have negative weights. The final score is the weighted sum of all predicted probabilities.

The content with a high score is ranked in front, and the content with a low score is ranked behind.

That's all.

X specifically emphasized in the open-source documentation: We have completely eliminated all manual feature engineering and most heuristic rules.

The Grok-based Transformer takes on all the heavy work. It understands your interaction history, such as what you liked, replied to, and shared, and then automatically determines what content is relevant to you based on this.

This means that the past operation strategies that relied on keyword stacking and label matching have become ineffective. The system now pays more attention to semantic understanding and can deeply analyze the actual value of the content and the real needs of users.

It's open-sourced, but not completely.

First, the model weights are not fully open.

The GitHub repository does contain a pre-trained mini Phoenix model, with 256-dimensional embeddings, 4 attention heads, and 2 layers of Transformer, packaged into a 3GB compressed package and distributed through Git LFS. This model allows developers to directly run the end-to-end inference process without training themselves.

But this is just a "mini version." The Phoenix model actually used by X in the production environment is much larger in scale, with the number of parameters, layers, and embedding dimensions not in the same order of magnitude. This open-sourced mini model is more like a teaching sample for you to understand how the system works, not the one X is really using.

It's like a small teaching engine that can let you understand the principle of the engine and can really run, but it's not the real engine that X uses to refresh the For You news feed for hundreds of millions of users every day.

The real production model is probably larger, more complex, with more training data, more parameter tuning, and more knowledge of user behavior. So its accuracy in recommendation, speed of response, and ability to handle real traffic are not in the same order of magnitude as this mini model.

Second, the training data is not made public.

Half of the core competitiveness of the recommendation system lies in the model, and the other half lies in the data. X processes 1.2 billion pieces of content every day and has accumulated a huge amount of user behavior data, such as who liked what, who blocked whom, what content someone viewed at what time, and for how long.

These data are the fundamental reason why the Phoenix model can accurately predict user behavior.

But these data cannot be open-sourced. On the one hand, it's a privacy issue, and on the other hand, it's a business secret.

Without these data, even if you get the complete model architecture and code, you can't train a recommendation system as good as X.

Third, only the framework of the advertising system is open-sourced, and the strategy is not.

This open source includes a new ads module that handles ad injection and positioning, including brand safety tracking, and respects the boundaries of sensitive content. But the specific advertising bidding logic, bidding strategy, and ROI optimization algorithm, which are directly related to X's revenue, are not fully disclosed.

Fourth, only part of the capabilities of the content understanding pipeline Grox (Grox is a content understanding engineering service based on Grok in the X recommendation system) are open-sourced.

Grox is a newly added service that provides classifiers, embedders, and a task execution engine for content understanding tasks such as spam detection, post classification, and PTOS policy enforcement. But the details of how Grox specifically determines whether a piece of content is spam, how it identifies violating content, and how it enforces platform policies are not completely transparent.

So, although you can build a social platform similar to X based on what's open-sourced on GitHub, you can't build a recommendation system as good as X's.

You can get the complete system architecture, candidate recall logic, sorting framework, and filtering rules, and run the end-to-end inference process. If you have enough engineering capabilities, you can indeed build a similar recommendation system.

But you don't have X's data, X's production-level model, or the engineering optimization and scheduling strategies that X has accumulated over the past few years. So you can't replicate the X platform exactly.

02

Why open source?

As early as when he acquired Twitter in October 2022, he publicly stated that "making the algorithm open source to increase trust" was one of his visions for this platform.

On March 31, 2023, Elon Musk fulfilled his first promise. The X platform, which was still called Twitter at that time, published the source code of part of the recommendation algorithm on GitHub, including the algorithm logic for tweet recommendation in the user timeline.

That open source attracted great attention.

Developers saw the internal operation mode of the Twitter recommendation system for the first time and also confirmed some long-circulating rumors for the first time. For example, certain accounts are indeed downgraded by the algorithm, and certain content types are indeed given priority in recommendation.

Elon Musk said at that time that providing "code transparency" would be "incredibly embarrassing" at first, but would ultimately "lead to a rapid improvement in recommendation quality."

He also said, "Most importantly, we want to earn your trust."

But that open source was not complete. Most of the files in the GitHub repository were from the initial upload, and there were few subsequent updates. Many developers complained that the codebase was not continuously maintained, the documentation was not detailed enough, and many key modules were not made public.

Obviously, Elon Musk learned from the lessons this time.

What's more interesting is that when Elon Musk posted the tweet about the algorithm update on X, he was in Beijing. But the real target of this open source is Europe.

The X platform is facing increasingly strict regulatory scrutiny in Europe, and Elon Musk is using "transparency" and "openness" as weapons to fight against the regulatory pressure.

In July 2025, the French prosecutor's office launched an investigation into the X platform, suspecting that its algorithm had biases and fraudulent data extraction behavior.

The European Commission also issued a document retention order to X, requiring it to provide algorithm-related content. The focus of the investigation was the spread of false information, ineffective content moderation, and deficiencies in information transparency.

The X platform refused to cooperate with the investigation at that time and accused it of being a "politically motivated criminal investigation" that threatened users' freedom of speech.

Elon Musk even replied with a swear word under the European Commission's tweet.

But refusing to cooperate is obviously not a long-term solution, so Elon Musk open-sourced the algorithm.

Rather than passively accepting the review of regulatory agencies, it's better to actively make the code public so that developers, researchers, and regulators around the world can see X's recommendation logic.

In this way, X can claim to be "the most transparent social platform in the world," and any accusations of algorithmic bias and content manipulation can be responded to with "the code is open source, go and see for yourself."

Offense is the best defense.

Of course, open source also comes at a cost.

First, competitors can directly learn from X's architectural design and engineering practices. Now others can thoroughly study how X does recall, sorting, and diversity control.

If some of X's designs are indeed better than its competitors, these designs will soon be copied.

Second, open source exposes X's weaknesses.

Developers have pointed out some problems in the GitHub issue area: Why are some filtering rules not designed reasonably? Why are the parameters for diversity control set so conservatively? Why is the ad injection logic so simple and crude?

But Elon Musk believes that these costs are worth it.

The biggest problem X is facing now is not technology, but trust. Users don't trust X's content moderation, advertisers don't trust X's brand safety, and regulatory agencies don't trust X's algorithm fairness.

Open-sourcing the algorithm is the most direct way to rebuild trust.

It can't solve all problems, but at least it can prove that X is not operating in the dark, X's recommendation logic can be verified, and X is willing to accept public supervision.

In an era that increasingly emphasizes transparency and accountability, this attitude itself is a competitive advantage.

Elon Musk said in 2023 when he open-sourced the Twitter algorithm, "We want X to be the most transparent system on the Internet and make it as powerful as the most famous and successful open-source project, Linux."

It seems that he is serious.

Whether X can really become the "Linux of the social media world" still needs time to verify.

But at least in terms of open source, Elon Musk is ahead of all major social platforms.

This article is from the WeChat official account "Alphabet AI", author: Miao Zheng. Republished by 36Kr with authorization.