HomeArticle

I used the new model created by Meta's "Chinese all-star team" to recreate a "Doubao App" with just one picture.

智东西2026-04-09 11:24
MSL submits its first report card.

Meta's "avocado" model is finally ripe!

According to a report by Zhidongxi on April 9th, today, after nine months of establishment, Meta's Super Intelligence Laboratory (MSL) released its first model Muse Spark (internal code name: avocado). This is a native multimodal reasoning model that supports tool usage, visual thinking chains, and multi - agent orchestration.

On the large - model evaluation platform Artificial Analysis, Muse Spark's intelligence index jumped directly from 18 points of Llama 4 Maverick to 52 points, falling between Claude Sonnet 4.6 and Claude Opus 4.6, which means it has entered the first echelon.

We experienced the model as soon as possible. We uploaded a screenshot of the Doubao App and asked Muse Spark to replicate it. It can be seen that Muse Spark's response style is quite colloquial, even with a "Doubao flavor", which may be because this model is mainly targeted at C - end users.

Muse Spark has a fast generation speed and good results. It basically replicated the Doubao page 1:1, even restoring the images.

Muse Spark also passed the ball - bouncing test. Some netizens sighed that after more than a year, Meta finally released a large model that can pass the hexagonal ball - bouncing test. This historic moment is worth recording.

Muse Spark is the first achievement of Alexandr Wang, the founder of ScaleAI and Meta's Chief AI Officer, after joining Meta for 10 months.

This result didn't come easily. Previously, after Llama 4 suffered an epic setback, Meta drastically reorganized its AI team, and Yann LeCun, who was pessimistic about large - language models, finally left.

Alexandr Wang said that Meta built the AI technology stack from scratch in the past nine months. The infrastructure, architecture, and data pipeline are all brand - new. Muse Spark is the result of these efforts.

Many Chinese AI experts who joined Meta have reposted this achievement, including Shengjia Zhao, Shuchao Bi, Jiahui Yu, Jason Wei, etc. It's worth mentioning that the MSL team has a high proportion of Chinese members. Among the Meta researchers who reposted the new model, many, from leaders to grass - roots employees, are Chinese.

According to a report by the Top Chinese Innovation and Entrepreneurship Association, Meta has also welcomed a new Chinese expert. Yi Wu, the former Chief Scientist of the RL Laboratory at Ant Group, has joined Meta MSL and reports directly to Nat Friedman, the vice - president of Meta and the co - head of MSL.

Yi Wu (Source: Top Chinese Innovation and Entrepreneurship Association)

Muse Spark is the first model in the MSL Muse series, and more models in this series will be released in the future. Currently, Muse Spark has begun to be gradually pushed to Meta's applications and the Meta.ai web - end, but some users still report that the model they are using is still Llama 3.

Meanwhile, the word "open - source" was not mentioned even once in the relevant blog.

Experience link: meta.ai

01. Outstanding in multimodality and healthcare, but still a shortcoming in agents and programming workflows

Judging from the benchmark test results, Muse Spark's performance in the fields of multimodal perception, reasoning, healthcare, and agents is in the first echelon of the industry. However, MSL admits that this model still has a performance gap in long - range agent systems and programming workflows.

Here are the complete benchmark test results of Muse Spark. It should be noted that Meta used some data presentation methods that are suspected of "chart fraud" here. At first glance, all of Muse Spark's scores are marked in blue, seemingly leading comprehensively. But in fact, in the 20 benchmark tests in the chart, it achieved the SOTA (State - of - the - Art) in 4 tests.

In the dimension of multimodal capabilities, Muse Spark's performance is quite competitive, and there is no obvious generational gap in the US large - model circle. It is basically on the same level as GPT - 5.4. Muse Spark's performance also conforms to its positioning as a native multimodal large model.

As a model that will be deployed in many of Meta's social media platforms and is targeted at a wide range of individual users, Muse Spark also keeps up in the healthcare field, which users frequently focus on. It achieved the SOTA in both the HealthBench Hard and MedXpertQA (multimodal) evaluations, indicating that it has been optimized in this regard.

Muse Spark also released the "Contemplating mode" this time. This mode can coordinate multiple agents for parallel reasoning, enabling Muse Spark to compete with the extreme reasoning modes of cutting - edge models such as Gemini Deep Think and GPT Pro.

After enabling the "Contemplating mode", Muse Spark's ability in complex tasks has been improved. For example, it achieved a 58% accuracy rate in the HLE "The Last Human Exam" benchmark test and a 38% accuracy rate in the "Cutting - edge Scientific Research" benchmark test.

02. Requires an order of magnitude less computing resources than Llama 4 and uses a new reinforcement learning technology stack

Beyond the benchmark scores, the new positioning and underlying technology of this model are also worthy of attention.

Meta says that Muse Spark is the first step towards personal super - intelligence. It can understand the world users are in, and multimodal capabilities and healthcare are currently the two key focuses.

From the underlying architecture level, Muse Spark integrates cross - domain and tool - related visual information and has good abilities in recognition and positioning. These functions combined can achieve various interactive experiences.

For example, users can upload a screenshot of a game screen and ask Muse Spark to turn it into a real interactive game.

Or users can tell Muse Spark that they have high cholesterol problems and ask Muse Spark to create a dynamic food recommendation page based on its multimodal capabilities and medical knowledge.

The demos shared by Meta in the blog only involve the multimodal and healthcare fields. This may mean that the ultimate use of the Muse series models is still to serve Zuckerberg's vision of personal super - intelligence, rather than simply pursuing the upper limit of intelligence.

In terms of technology, MSL has significantly improved the computing power utilization rate. Compared with the previous model, Llama 4 Maverick, Muse Spark can achieve the same performance with more than an order of magnitude less computing resources.

At the same time, MSL also adopted a new technology stack in the reinforcement learning stage, bringing stable and predictable performance improvements in large - scale reinforcement learning.

03. First - hand test: Accurately identify food calories and create a new product for Meta's AI glasses

After the release of Muse Spark, we conducted more practical tests.

Muse Spark's multimodal capabilities are indeed good. We uploaded a photo of a beer bottle and asked it to analyze the calories. Muse Spark immediately recognized the brand and size of the beer, and even accurately identified the alcohol content that was difficult to distinguish with the naked eye in the original picture.

Its analysis of calories comes from searches. It also converts the calories into common foods we eat and provides the corresponding amount of exercise needed to burn these calories, which is quite practical.

Then we asked Muse Spark to create a promotional webpage for Meta's AI glasses without giving any references. It can be seen that during the thinking process, Muse Spark actively called the AI image - generation model to create the corresponding product pictures and then wrote the complete page code. The whole process took about 2 minutes, and the result is as follows:

The completion degree of this webpage is quite high. It directly designed a new AI glasses product for Meta equipped with Muse Spark. The model even boasted that this is a flagship official website, not an ordinary landing page, and it was created according to the standards of the Apple Vision Pro press conference.

Muse Spark can also be used for shopping recommendations. We tried to ask it to search for a car windshield wiper, and within a few seconds, we got several options, along with an analysis of the advantages and disadvantages of each product and final purchase suggestions.

04. Conclusion: The avocado is ripe, but Meta's "personal super - intelligence" still requires patience

As the debut of Meta's Super Intelligence Laboratory, Muse Spark has shown the level to enter the first echelon, which is enough to make people look forward to the subsequent products of the Muse series.

However, the "personal super - intelligence" that Zuckerberg wants is currently limited to relatively controllable scenarios such as healthcare Q&A, webpage replication