HomeArticle

The mysterious model HappyHorse has suddenly appeared and dominated the rankings. Has a "catfish" entered the video generation arena?

AI价值官2026-04-08 15:45
The Happy Horse model tops the AI video list, sparking discussions about the competition between open-source and closed-source models.

Without a product launch, without technical blogs, and without any corporate endorsement—a text-to-video model named HappyHorse-1.0 quietly topped the AI Video Arena rankings on the authoritative AI evaluation platform Artificial Analysis. It surpassed Seedance 2.0 with a higher Elo score and left behind mainstream players like Keling and Tianguang. For a moment, it triggered a "decryption competition" in the tech circle.

The rankings on Artificial Analysis are not based on technical parameter evaluations but on the Elo scores aggregated from the blind tests of real users, reflecting the real perceptions of ordinary people after watching. This makes this ranking harder to be easily questioned than the usual benchmarking lists and turns the question of "who actually made this thing" into an issue that cannot be ignored.

"Happy Horse" Quietly Tops the List, Triggering a Guessing Contest in the Tech Circle

Speculations on X came quickly. The first thing people noticed was the language order on the official website: Mandarin and Cantonese were ranked ahead of English. For a product targeting global users, this order is a bit abnormal—if it were led by an American team, English would hardly not be the first. It can be basically confirmed that the team behind it is from China.

The name itself is also a clue. 2026 is the Year of the Horse in the Chinese lunar calendar, and the name "HappyHorse" hides a rather obvious Year of the Horse joke. Earlier this year, "Pony Alpha" played a similar trick. So the list of suspects quickly grew: The founders of Tencent and Alibaba both have the last name Ma, so they are naturally on the list; some bet on Xiaomi, thinking that Lei Jun has always been low - key and likes to make sudden moves; others think it's more like DeepSeek, after all, DS quietly launched a vision model before and then quietly took it offline. The speculations from all sides were lively, but none of them had solid evidence.

What really locked in the target was a point - by - point comparison at the technical level. X user Vigo Zhao compared the public benchmark data of HappyHorse-1.0 with known models one by one and found a highly matching object: daVinci-MagiHuman, which is the open - source model "Da Vinci Magic Human" launched on Github in March.

Visual quality: 4.80, text alignment: 4.18, physical consistency: 4.52, voice character error rate: 14.60%—the two sets of data match item by item. The official website structures are also almost the same: the architecture descriptions, performance tables, and presentation styles of the demonstration videos all seem to be from the same template. Both use a single - stream Transformer architecture, both generate audio and video jointly, and the supported language lists are exactly the same. Such a high degree of coincidence is hard to explain as a coincidence.

The conclusion most recognized in the tech circle at present is that HappyHorse is an iterative version optimized by Sand.ai, one of the joint developers of daVinci-MagiHuman, based on the open - source model. The core purpose is to verify the upper limit of the model's performance under real user preferences and pave the way for subsequent commercialization.

daVinci-MagiHuman was officially open - sourced on March 23, 2026, and is the product of the cooperation between two young teams. One is from the Generative Artificial Intelligence Research Laboratory (GAIR) of Shanghai Innovation Institute (SII), led by scholar Liu Pengfei; the other is Sand.ai (Sandai Technology) in Beijing. Its founder, Cao Yue, also has an academic background, and the company's direction is the autoregressive world model.

The model uses a pure self - attention single - stream Transformer with 15 billion parameters, putting all the tokens of the three modalities of text, video, and audio into the same sequence for joint modeling. No one in the open - source community has ever done real audio - video joint pre - training from scratch before. Most of them are splicing on the basis of single - modality.

How Can an Open - Source Video Model Achieve a Comeback in Two Weeks?

After figuring out its identity, another question is even harder to answer: daVinci-MagiHuman was only open - sourced at the end of March. How could HappyHorse-1.0 get a higher Elo score than Seedance 2.0 in just two weeks?

Judging from the information disclosed on the official website, HappyHorse didn't make any major changes to the underlying architecture. A reasonable guess is that it made special adjustments to the default generation strategy for the evaluation scenario.

The Elo system is essentially an accumulation of user preferences. Slightly improving on sensitive perception items such as whether the character's expression is stable, whether the audio and video are aligned, and whether the picture is pleasing to the eye makes it easier to be selected in the blind test. The upper limit of the model's ability remains the same, but the "evaluation performance" can be polished.

In fact, in the blind - test samples of Artificial Analysis, portrait generation and voice - over content account for more than 60%. And daVinci-MagiHuman has focused on portrait performance since the training stage, which gives it a natural advantage in this type of scenario. This is also the core reason for its leading win - rate in the blind test. If the blind - test samples are mainly close - ups of portraits, models good at portraits will systematically have an advantage, which has no direct relation to their actual performance in complex scenarios such as multi - character, complex camera movements, and long - sequence narratives.

As a result, there is an obvious gap between the numbers on the rankings and the actual test experience, and the discussants on X are divided into two camps. After testing, the skeptical camp believes that there are still visible gaps between HappyHorse-1.0 and Seedance 2.0 in terms of character details and dynamic coherence, and they question the representativeness of the Elo score itself.

The supporters, on the other hand, have high hopes for the potential of HappyHorse. They hope it can solve the industry pain point of "picture quality consistency in multi - shot sequences" because this is a problem that current mainstream video models haven't solved well. If daVinci-MagiHuman really makes a breakthrough here, it may be much more important than a list ranking.

The limitations of the model itself should not be masked by the numbers. The Xiaohongshu blogger @JACK's AI World deployed and tested daVinci-MagiHuman as soon as possible. It was found that it requires an H100 to run, and ordinary consumer - grade graphics cards basically won't work. Although the community is researching quantization solutions, it is still difficult for individual users to deploy it locally in the short term.

In terms of scenarios, it is currently mainly good at single - character scenarios. Once multiple people appear or the scenario becomes complex, the effect will decline. This is not a problem that can be solved by adjusting parameters and is directly related to its design orientation of focusing on portraits. The generation time is generally about 10 seconds. If it is longer, it is easy to get messy, and high - definition output still needs to be supplemented by super - resolution plugins.

@JACK's AI World concluded that the comprehensive usability of daVinci-MagiHuman is not as good as that of LTX 2.3, and it will only be suitable for daily use after the community has completed the quantization.

Has the Video Generation Track Gotten a Real "Catfish"?

Of course, leading the list once doesn't mean much. Next, HappyHorse needs to be more fully tested in terms of stability, high - concurrency access speed, cross - scenario consistency, character control accuracy, and generalization ability outside the evaluation set. These are the core indicators that determine whether a model can truly enter the workflow of creators.

But if we look at the bigger industry landscape, the signal conveyed by this incident is already clear enough.

Open - source video models are not new. But there has always been a visible gap in terms of effects between open - source and closed - source models. In scenarios where products need to be delivered to customers, the generation quality of open - source models has long failed to cross the threshold from "usable" to "deliverable". The pricing power of closed - source products like Keling and Seedance is, to a large extent, based on this gap.

The significance of this time is that a product based on an open - source model has, for the first time, directly matched the current mainstream closed - source competitors on the blind - test rankings based on real user perceptions. No matter how much optimization there is for the evaluation scenario, for closed - source manufacturers that rely on this gap to build pricing power, at least this is a signal worthy of serious attention.

For developers, the meaning of this inflection point is more specific. In vertical scenarios such as portraits, digital humans, and virtual anchors, once the generation quality of the open - source base reaches the "deliverable" threshold, the cost structure of self - deployment will change substantially. It's not just about the compression of API call costs. More importantly, it will fully bring data, models, and inference links under one's own control, and gain flexibility in terms of customization depth and privacy compliance that closed - source solutions can hardly provide.

HappyHorse-1.0 won't shake the market position of Seedance 2.0 or Keling in the short term. But once the perception that the effects of open - source models can match those of closed - source models is established, subsequent quantization optimization, vertical fine - tuning, and inference acceleration will be continuously promoted by the community at a much faster iteration speed than closed - source products.

In this Year of the Horse, perhaps what really deserves attention is not which horse runs the fastest, but that the track itself is getting wider.

This article is from the WeChat official account "AI Value Officer", author: Xing Ye, editor: Mei Qi. Republished by 36Kr with authorization.