Alibaba's HappyHorse 1.1 Launched: Sending Chinese National Football Team to the World Cup

Capability improvement across five dimensions.

According to a report by ZDONGXI on June 22nd, today, Alibaba released its latest generation of video generation model HappyHorse 1.1. Alibaba said that compared with HappyHorse 1.0, this generation of model has certain improvements in dimensions such as dynamic expressiveness, subject consistency, instruction following, visual texture, and audio capabilities.

The technical specifications of HappyHorse 1.1 remain consistent with those of HappyHorse 1.0. The single - generation duration is 3 to 15 seconds, supporting 720p and 1080p resolutions and free aspect ratios.

Alibaba officially showed several generation results of HappyHorse 1.1. In tasks such as dancing that test the model's dynamic expressiveness and action coherence, the actions generated by HappyHorse 1.1 are smooth and natural. It doesn't have problems like slow - motion and after - images that many previous video generation models had. The human body features are in line with normal conditions, and the appearance also remains consistent.

In terms of stylized content, HappyHorse 1.1 well maintains the style of traditional Chinese paintings in the following examples without problems such as style deviation.

After the model was released, ZDONGXI immediately conducted a practical test on HappyHorse 1.1 and compared its performance with that of HappyHorse 1.0 and the recently released Seedance 2.0 Mini.

From the practical test results, HappyHorse 1.1 has indeed achieved certain improvements compared with the previous generation of models. Especially, the problem of "greasy feeling" in the picture has been well solved. However, in some marginal scenarios and tasks with multiple reference subjects, there is still room for improvement in the realism and compliance with physical laws of its generation results.

Currently, HappyHorse 1.1 has been launched on Alibaba Cloud Bailian Platform and the HappyHorse official website. Taking text - to - video as an example, the price for generating a 720p - resolution picture is 0.9 yuan per second (0.54 yuan after discount), which remains the same as that of HappyHorse 1.0; the price for generating a 1080p - resolution picture is 1.2 yuan per second (0.72 yuan after discount), which is 25% lower than that of HappyHorse 1.0.

Experience link: www.happyhorse.cn

API access: bailian.console.aliyun.com

Our practical test was carried out in five dimensions. In terms of dynamic expressiveness, Alibaba said that there were problems in the previous HappyHorse 1.0 version, such as some slow - moving and rhythm - lacking actions in the pictures. The 1.1 version has optimized the motion modeling and timing consistency capabilities, improving the coherence and sense of strength of the actions.

We tested the performance of HappyHorse 1.1 with a motorcycle - driving case. It can be seen that the speed of the content in the picture generated by HappyHorse 1.1 is normal and in line with basic physical laws, except that the light of the motorcycle taillight is a bit inconsistent with the real situation. After zooming in to a close - up, the scenery reflected in the motorcycle windshield is also quite logical.

In the same task, the video generated by HappyHorse 1.0 had a slow - motion problem. In addition, the motorcycle in the picture was going in the wrong direction, and the reflection in the helmet did not match the actual picture content.

In terms of subject consistency, HappyHorse 1.1 supports the simultaneous input of 9 character reference images, allowing flexible combination of product details, brand elements, characters, and scenes. For popular gameplay such as multi - shot and N - grid picture references, HappyHorse 1.1 has also enhanced its understanding ability of reference images.

We uploaded three reference images depicting a specific person leaving the job and asked both HappyHorse 1.1 and HappyHorse 1.0 to generate a 10 - second video. HappyHorse 1.1 accurately restored the facial features and clothing characteristics of the person in the video. In two shots, the scene and character details remained stable and consistent, even the details in the corners of the picture.

Although the picture generated by HappyHorse 1.0 basically maintained subject consistency, there were more physical bugs in the picture. In contrast, the picture of HappyHorse 1.1 had few flaws.

In terms of instruction following, we compared the performance of HappyHorse 1.1 and Seendance 2.0 Mini. The prompt content is as follows:

In a modern - style coffee shop, gravity suddenly disappears. Customers, tables and chairs, books, and various items slowly float into the air. The barista floats and continues to make coffee. After the liquid coffee overflows from the cup, it forms countless floating liquid balls. An orange cat slowly swims through the space like swimming in water. The camera continuously rotates and moves freely to show the entire weightless environment. All floating objects must follow the real laws of inertia and momentum, and the movement of the liquid needs to conform to the physical characteristics of fluids. The overall presentation has extremely high realism and complex physical simulation capabilities.

Both HappyHorse 1.1 and Seendance 2.0 Mini can restore the details one by one according to the order of the prompt. However, in this surreal scene, there are obvious problems in the final pictures of both HappyHorse 1.1 and Seendance 2.0 Mini. HappyHorse 1.1 has more continuity errors: the expressions of the characters are dull, and a chair suddenly appears out of the ground in the picture.

The generation result of Seendance 2.0 Mini does not depict the floating state of the liquid in a vacuum in line with physical laws, but the expressions of the characters are more in line with the overall style.

In terms of visual texture, we asked HappyHorse 1.1 to generate a picture of the Chinese national team scoring a goal in the World Cup final. In such a picture involving a large number of people, it can be felt that HappyHorse 1.1 has less problems of "greasy feeling" and "over - sharpening" in the portrayal of the main characters in the picture. However, in the background of the picture, the faces of the people are a bit blurred, and the realism and sense of dynamics are slightly lacking.

Finally, in terms of audio capabilities, we compared the effects of HappyHorse 1.1 and HappyHorse 1.0. The test case was an instrument - playing scene. In this specific scene, there is no obvious improvement in HappyHorse 1.1 compared with HappyHorse 1.0, and the changes in the playing picture do not match the changes in the audio.

Conclusion: The improvement meets the expectations of a minor - version iteration

From the results of this practical test, the upgrade of HappyHorse 1.1 basically meets our expectations for a minor - version iteration. It has carried out relatively solid optimizations for the actual problems exposed in the previous generation of products, and has achieved relatively obvious improvements in aspects such as motion performance, character restoration, and overall visual perception.

At the same time, the cost of this generation of model is further reduced, which shows that while improving the model effect, Alibaba is also continuously considering cost - effectiveness. In the future, as video generation models continue to evolve in the directions of longer duration, stronger controllability, higher realism, lower cost, and real - time interaction, we are expected to see this technology achieve large - scale implementation in more scenarios.

This article is from the WeChat official account "ZDONGXI" (ID: zhidxcom). The author is Chen Junda, and the editor is Xinyuan. It is published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Just now, Alibaba's HappyHorse 1.1 was launched, and I used it to send the Chinese national football team to the World Cup

Conclusion: The improvement meets the expectations of a minor - version iteration