With revenues exceeding 100 million, this dark horse in multi-modal generative AI is embarking on a new journey.
Nearly two years have passed since OpenAI released its text-to-video model Sora. However, AIGC companies in China and the United States are showing very different development states: on one hand, there is Sora 2, which has high costs and has never been rolled out on a large scale, and the Sora App, which has almost zero retention rate; on the other hand, Chinese companies, deeply rooted in a vast application market, are gradually reaching a state of full swing and experiencing a full-scale commercial explosion.
According to recent information obtained by Intelligence Emergence, Zhixiang Future, a generative AI startup focusing on visual multi-modalities, had an annual revenue of over 100 million yuan in 2025. Its consumer product, vivago.ai, also recently reached a peak in downloads. In January alone, it gained nearly 10 million new users. It ranked among the top 10 in the "Video Playback & Editing" category on Google Play in more than 100 countries and regions around the world, demonstrating great commercial development potential.
Since its establishment, Zhixiang Future has successively released the HiDream - I1 large image generation model and the HiDream - E1 interactive editing model. In April 2025, these models were fully open - sourced and topped the international authoritative AI evaluation list, Artificial Analysis, within 24 hours of the open - source release.
This company, which originated in Hefei, has found a perfect balance between generation quality and efficiency through its self - developed large model with over 10 billion parameters and the globally innovative diffusion autoregressive architecture. Currently, its products are widely used in fields such as cultural and creative industries, film and television, and advertising.
Intelligence Emergence exclusively learned that Zhixiang Future has accelerated its financing process: the Series B financing has entered the closing stage, and the Term Sheet for the next round has been secured in advance. A core insider close to the company revealed that both rounds of financing are in the hundreds of millions of yuan range. In the increasingly competitive AI visual generation track, Zhixiang Future continues to attract heavy investment from leading capital with its hardcore technical strength and clear commercialization path.
In the current situation where the application of multi - modalities is accelerating, what other commercial potential does Zhixiang Future hide?
The Most Industrialized Scientists, the Most Practical Romance
From its very beginning, Zhixiang Future has found a practical form of romance. Its founder, Mei Tao, is a foreign academician of the Canadian Academy of Engineering and previously worked at Microsoft for 12 years. He has published more than 300 papers in the fields of multimedia analysis and computer vision and has won 15 international best paper awards.
However, Mei Tao's experience is not limited to academia. In 2018, he joined JD.com and served as the deputy dean of the JD Exploratory Research Institute. This career experience allowed him to see the path from technology to commercialization.
When deciding to establish Zhixiang Future, Mei Tao had a clear vision. On one hand, multi - modality is considered the most likely path to achieve general AGI, and this view has since become an industry consensus. At the same time, in terms of commercial prospects, multi - modality has broader potential than pure language models. "Currently, 50% - 60% of the global AIGC revenue comes from applications related to images and videos, higher than that of pure text models. When we made the decision to start the business in 2023, multi - modality companies like Midjourney had already proven their strong commercialization capabilities through SaaS tools, clearly verifying the market fit of their products," Mei Tao told 36Kr in mid - 2025.
This is precisely Mei Tao's main battlefield, as he has profound expertise in computer vision (CV) and multi - modality fields.
However, for Chinese innovative companies at that time, when they first entered the battlefield, Sora was a huge mountain in front of them. Considering its high - level restoration of the physical world and amazing effects, the industry was quite eager to see if Chinese startups could produce comparable results.
A race thus began. Just half a year after the release of Sora, Zhixiang Future released its self - developed multi - modality large model. In April 2025, Zhixiang Future even open - sourced the large image generation model HiDream - I1 and the interactive editing model HiDream - E1 at once, closing the loop from dialogue to image creation. HiDream - I1 topped the authoritative list, Artificial Analysis, within 24 hours, becoming the first self - developed Chinese generative AI model to enter the global first - tier and setting new industry records in three dimensions: image quality, semantic understanding, and artistic expression.
However, many entrepreneurs later reviewed and thought that Sora was actually a bit backward in terms of architectural innovation. Mei Tao also felt that the overall functions of Sora were similar to expectations at that time. In the following six months, with the entry of startups like Zhixiang Future, OpenAI no longer has a significant advantage in the current video generation field. Especially from the perspective of product implementation, there is not much difference between other products, whether overseas or domestic.
Meanwhile, in exploring the multi - modality architectural paradigm, Zhixiang Future is even at the forefront. The company was the first to develop dual models for generation and understanding and then planned an integrated model for understanding and generation, which is regarded as the best path to the physical world.
Zhixiang Future has always been on the path of breaking through industry challenges. In 2025, with the open - source of the latest model and the release of products like vivago 2.0, Mei Tao told 36Kr that the DiT (Diffusion Transformer) architecture uses the powerful capabilities of Transformer to process video data, enabling AI models to efficiently model spatio - temporal relationships and flexibly generate videos of different resolutions. This is an important advancement. However, for the entire generative AI field, the realistic restoration of complex physical phenomena remains an unresolved problem - dynamic details that can be intuitively felt by humans, such as the trajectory of splashing water droplets and the mechanical feedback of object collisions, are still in the exploratory stage of "looking similar but lacking the essence," and visual inconsistencies often occur in relevant scenarios.
Zhixiang Future has found an excellent balance between generation effects and running speed through the Sparse DiT architecture. Then, through adversarial distillation technology, it has not only increased inference efficiency but also greatly enhanced the details and aesthetics of the images. This has ultimately contributed to several creative achievements of the HiDream - I1 model under Zhixiang Future.
Blaze a New Trail in Algorithms and Solve the Last - Mile Problem
Different from the logic of large companies that focus on base models and parameters, small companies pay more attention to innovation and implementation. In Mei Tao's view, this is also the value of Zhixiang Future, which is to solve the last - mile implementation problem of AI.
He once told 36Kr, "From the first day of our startup, we have been very aware of the sense of crisis and have been thinking about how to find the Product - Market Fit (PMF). We started and advanced relatively early in commercialization. Although we haven't raised the most funds, we have thought through every penny we spend and every person we recruit."
In its early days, Zhixiang Future formulated a "1 + 3+N" layout, that is, one core multi - modality large model driving three major products: a creative tool platform, an interactive marketing content tool, and a one - stop video creation agent. As of now, its services have covered over 20 million individual users and over 40,000 enterprise users globally.
After clarifying the positioning, the core lies in how to deliver well and serve customers to make AI truly generate value.
Mei Tao told 36Kr that Zhixiang Future has the most comprehensive multi - modality copyrighted corpus in China, hundreds of thousands of hours of copyrighted video materials, and tens of thousands of authorized IPs. It not only covers 70% of domestic film and television data but has also created hundreds of millions of AIGC second - creation materials, which are currently widely used in scenarios such as film and television, cultural tourism, and marketing.
"At the Microsoft Research Institute, we often said that it might take a hundred engineers to turn a technology into a product, and another hundred solution experts or business developers to sell the product well. This shows how big the gap is. At that time, I thought I must find a place to bridge this chain."
It is precisely this full - chain ability from technology to implementation that has made Zhixiang Future popular among capital since its inception.
In 2024, Zhixiang Future completed a Series A financing of hundreds of millions of yuan, led by Hefei Industrial Investment Group, with participation from institutions such as the Anhui Artificial Intelligence Mother Fund. At the end of 2025, JD.com Group increased its investment in Zhixiang Future as a strategic investor. The huge business scenarios behind JD.com, including logistics, retail, health, and industry, are excellent testing grounds and application fertile grounds for multi - modality AI technology.
Subsequently, an insider revealed that Zhixiang Future had started the preparations for Series B financing in full swing and planned to complete the closing in early 2026.
36Kr recently learned that Zhixiang Future has successfully obtained the Term Sheet for the next round. Existing shareholders continue to support, and new shareholders include industrial capital, listed companies with in - depth business cooperation potential, and well - known investment institutions. Currently, the Series B financing amount has reached hundreds of millions of yuan.
Yuan Guoliang, the CEO of Shanghai Dunhong Asset, said when evaluating Zhixiang Future, "We firmly believe that video generation technology, as a new - generation productivity tool, will fully empower all industries. Especially in the e - commerce field, video has become the core medium connecting products and consumers. HiDream has initially verified its application value and commercialization potential in the e - commerce scenario through its products, demonstrating that the team not only understands technology but also the industry. At the same time, we believe that its technical architecture and evolution direction have the potential to expand into a more general and cognitively deep world model, which represents a leap in underlying capabilities. We look forward to exploring the long - term path of technology - industry integration with the team and helping to promote multi - modality generation to become a universal and intelligent industry infrastructure."
The Best Target with Both Commercialization Strength and Architectural Innovation
2025 was the first year of the explosion of multi - modality generative AI in China. With the increasing maturity of AIGC technology, productivity and creativity have been significantly improved, driving the application market to grow explosively. According to IDC data, the global generative AI market size is expected to have a compound annual growth rate of up to 63.8% in the next five years and will reach $284.2 billion by 2028, accounting for 35% of total AI investment. Zhixiang Future has benefited from this trend with its strong technical strength and industrial implementation thinking. The company's commercialization process has been rapid. 36Kr learned that Zhixiang Future's annual revenue in 2025 exceeded 100 million yuan.
To achieve such results quickly in the highly competitive multi - modality generation field is due to Zhixiang Future's unique business model thinking and strong underlying innovation ability. It can be said that Zhixiang Future is one of the few companies in the industry that focuses on both commercialization and technological innovation.
In the three years since its establishment, Zhixiang Future has experienced different business models. In 2023, it adopted the MaaS model, selling models and APIs, similar to the PaaS model in cloud computing. In 2024, it switched to the SaaS model, mainly selling tools for users to produce content on the Zhixiang Future platform.
Now, it has upgraded its model to RaaS, a business model focused on delivering results and user value. This includes providing tools, content materials, and limited - quota video production/placement with only a small basic fee, mainly earning commissions from the customer's increased GMV. According to Mei Tao, he believes that this customer - value model is relatively clear, allowing for almost zero - risk investment and shared incremental benefits.
As the startup is gradually on the right track, Mei Tao also said that he has found a balance between commercial returns and ability improvement. On one hand, the company is continuously increasing its efforts in researching vertical basic models. A more powerful and advanced underlying architecture will surely lay a better foundation for model capabilities. In addition to independent R & D, Zhixiang Future is also embracing a broader ecosystem through open - source to increase the possibility of success. On the other hand, it still focuses on solving the last - mile problem, delving into the actual scenario needs of users, integrating more vertical data from industries such as education, e - commerce, and cultural tourism, and conducting fine - tuning to truly solve industry problems.
Intelligence Emergence also learned that Zhixiang Future is currently researching a new - generation multi - modality generation architecture with multi - modality reasoning drive and infinite memory, which will significantly improve the model's reasoning ability and achieve a higher level of horizontal scaling - up among multiple tasks.
Now, with the resonance of technology, market, and policy, the industry is also realizing that AI - generated videos are no longer just toys for geeks but productivity tools that can directly generate cash flow. Since last year, popular AIGC videos such as the "Cat and Dog Sports Meeting" and "Cutting Glass - like Fruits with a Knife" have gone viral on social platforms, attracting more and more creators. It is a common choice for both leading players and ordinary consumer - end users, ultimately accelerating the commercialization process of the video generation track.
According to data from the international research institution Fortune Business Insights, the global scale of AI video generation was approximately $620 million in 2024 and is expected to reach $2.56 billion in 2032, with a compound growth rate of 20% from 2025 to 2032.
At present, AIGC has become the mainstream choice in the marketing and specific content fields. A more promising prospect is that when the model can stably solve the problems of character consistency and long - term coherence, AIGC will detonate the market in high - end applications such as film and television and gaming. And when the model breaks through the problem of consistency between understanding and generation, it will truly understand the physical world and generate more realistic and controllable content and details. That will be the real explosion moment of the video generation track. In this race, Zhixiang Future is at the forefront.