With revenues exceeding 100 million, this dark horse in multi-modal generative AI is embarking on a new journey.
Nearly two years have passed since OpenAI released its text-to-video model Sora, yet AIGC companies in China and the United States are showing vastly different development trajectories. On one hand, there's Sora 2, which has high costs and struggles to achieve large-scale adoption, and the Sora App with almost zero user retention. On the other hand, Chinese companies, deeply rooted in a vast application landscape, are hitting their stride and experiencing a full-fledged commercial explosion.
According to recent reports from Intelligent Emergence, Zhixiang Future, a generative AI startup focusing on visual multi-modalities, exceeded 100 million yuan in total revenue in 2025. Its consumer product, vivago.ai, also recently reached a peak in downloads. In January alone, it gained nearly 10 million new users and entered the top 10 of the "Video Playback & Editing" category on Google Play in over 100 countries and regions worldwide, demonstrating significant commercial potential.
Since its establishment, Zhixiang Future has released the HiDream-I1 large image generation model and the HiDream-E1 interactive editing model, both of which were fully open-sourced in April 2025. Within 24 hours of the open-source release, HiDream-I1 topped the international authoritative AI evaluation list, Artificial Analysis.
This company, originating from Hefei, has found the perfect balance between generation quality and efficiency through its self-developed large model with over 10 billion parameters and a globally innovative diffusion autoregressive architecture. Currently, its products are widely used in cultural and creative industries, film and television, advertising, and other fields.
Intelligent Emergence exclusively learned that Zhixiang Future is accelerating its financing process. The Series B financing is in the closing stage, and the Term Sheet for the next round has already been secured. A core insider close to the company revealed that both rounds of financing are in the hundreds of millions of yuan range. In the increasingly competitive AI visual generation market, Zhixiang Future continues to attract heavy investment from leading capital due to its strong technical capabilities and clear commercialization path.
With the accelerated implementation of multi-modal applications, what other commercial potential does Zhixiang Future hold?
The Most Industrialized Scientists, the Most Pragmatic Romance
From its inception, Zhixiang Future has embraced a pragmatic form of romance. Its founder, Mei Tao, is a foreign academician of the Canadian Academy of Engineering and has previously worked at Microsoft for 12 years. He has published over 300 papers in the fields of multimedia analysis and computer vision and has won 15 international best paper awards.
However, Mei Tao's experience extends beyond academia. In 2018, he joined JD.com as the deputy dean of the JD Exploration Research Institute. This role allowed him to see the path from technology to commercialization.
When deciding to establish Zhixiang Future, Mei Tao had a clear vision. On one hand, multi-modalities are the most likely path to achieving general AGI, a view that has since become an industry consensus. At the same time, in terms of commercial prospects, multi-modalities offer a broader space than pure language models. "Currently, 50% - 60% of the global AIGC revenue comes from image and video-related applications, higher than that of pure text models. When we made the decision to start the business in 2023, multi-modal companies like Midjourney had already proven their strong commercialization capabilities through SaaS tools, clearly validating the market fit of their products," Mei Tao told 36Kr in mid-2025.
This is precisely Mei Tao's main battlefield, where he has deep expertise in computer vision (CV) and multi-modal fields.
However, for Chinese innovative companies at that time, Sora was a formidable obstacle. Given its high level of physical world restoration and stunning effects, the industry was eager to see if Chinese startups could produce comparable results.
A race began. Just six months after the release of Sora, Zhixiang Future launched its self-developed multi-modal large model. In April 2025, it open-sourced the HiDream-I1 large image generation model and the HiDream-E1 interactive editing model, closing the loop from dialogue to image creation. HiDream-I1 topped the authoritative list, Artificial Analysis, within 24 hours, becoming the first self-developed Chinese generative AI model to enter the global first-tier and setting new industry records in three dimensions: image quality, semantic understanding, and artistic expression.
Afterward, many entrepreneurs reflected that Sora was somewhat behind in terms of architectural innovation. Mei Tao also felt that Sora's overall functionality was in line with expectations. In the following six months, with the entry of startups like Zhixiang Future, OpenAI no longer holds a significant advantage in the current video generation field. Especially from the perspective of product implementation, there isn't much difference between overseas and domestic products.
Meanwhile, in exploring the multi-modal architecture paradigm, Zhixiang Future is even leading the way. The company was the first to develop dual models for generation and understanding and then planned to integrate understanding and generation, which is considered the best path to the physical world.
Zhixiang Future has also been on the path of solving industry challenges. In 2025, with the open-source of the latest model and the release of products like vivago 2.0, Mei Tao told 36Kr that the DiT (Diffusion Transformer) architecture uses the powerful capabilities of Transformer to process video data, enabling AI models to efficiently model spatio-temporal relationships and flexibly generate videos of different resolutions. This is an important advancement. However, for the entire generative AI field, the realistic restoration of complex physical phenomena remains an unsolved problem. Dynamic details that humans can intuitively sense, such as the trajectory of splashing water droplets and the mechanical feedback of object collisions, are still in the exploratory stage of "looking similar but lacking the essence," often resulting in visual incongruities in relevant scenarios.
Zhixiang Future has found an excellent balance between generation effects and running speed through its Sparse DiT architecture. Through adversarial distillation technology, it has significantly enhanced the details and aesthetics of the images while increasing inference efficiency. This has ultimately led to several innovative achievements of the HiDream-I1 model under Zhixiang Future.
Blaze a New Trail in Algorithms and Solve the Last-Mile Problem
Different from the logic of large companies competing in base models and parameters, small companies focus more on innovation and implementation. In Mei Tao's view, this is also the value of Zhixiang Future, which is to solve the last-mile problem of AI implementation.
He once told 36Kr, "From the first day of our startup, we have been very aware of the sense of crisis and have been thinking about how to find the Product-Market Fit (PMF). We started and progressed relatively early in commercialization. Although we haven't raised the most funds, we have a clear plan for every penny we spend and every person we hire."
In its early days, Zhixiang Future established a "1+3+N" layout, which consists of one core multi-modal large model driving three major products: a creative tool platform, an interactive marketing content tool, and a one-stop video creation agent. As of now, its services cover over 20 million individual users and over 40,000 enterprise users globally.
After determining the positioning, the core is how to ensure delivery and serve customers well, enabling AI to truly generate value.
Mei Tao told 36Kr that Zhixiang Future has the most comprehensive multi-modal copyright corpus in China, hundreds of thousands of hours of copyrighted video materials, and tens of thousands of authorized IPs. It not only covers 70% of domestic film and television data but also has formed hundreds of millions of AIGC secondary creation materials, which are currently widely used in scenarios such as film and television, cultural tourism, and marketing.
"At the Microsoft Research Institute, we often said that it might take a hundred engineers to turn a technology into a product, and another hundred solution experts or business development (BD) personnel to sell the product well. This shows how big the gap is in between. At that time, I thought I must find a place to bridge this gap."
It is precisely this full-chain ability from technology to implementation that has made Zhixiang Future popular among investors since its inception.
In 2024, Zhixiang Future completed a Series A financing of hundreds of millions of yuan, led by Hefei Industrial Investment Group, with participation from institutions such as the Anhui Artificial Intelligence Mother Fund. At the end of 2025, JD.com Group increased its investment in Zhixiang Future as a strategic investor. The vast business scenarios behind JD.com, including logistics, retail, health, and industry, are the perfect testing ground and application soil for multi-modal AI technology.
Subsequently, an insider revealed that Zhixiang Future has been actively preparing for the Series B financing and plans to complete the closing in early 2026.
36Kr recently learned that Zhixiang Future has successfully obtained the Term Sheet for the next round. Existing shareholders continue to support the company, and new shareholders include industrial capital, listed companies with in-depth business cooperation potential, and well-known investment institutions. Currently, the Series B financing amount has reached hundreds of millions of yuan.
Yuan Guoliang, the CEO of Shanghai Dunhong Asset, said in his evaluation of Zhixiang Future, "We firmly believe that video generation technology, as a new generation of productivity tools, will empower all industries. Especially in the e-commerce field, video has become the core medium connecting products and consumers. HiDream has initially verified its application value and commercial potential in the e-commerce scenario through its products, demonstrating that the team not only understands technology but also the industry. At the same time, we believe that its technical architecture and evolution direction have the potential to expand into a more general and cognitively deep world model, which represents a leap in underlying capabilities. We look forward to exploring the long-term path of technology and industry integration with the team and helping to promote multi-modal generation to become a universal and intelligent industry infrastructure."
The Best Candidate with Both Commercialization Strength and Architectural Innovation
2025 was the year when Chinese multi-modal generative AI exploded. With the increasing maturity of AIGC technology, productivity and creativity have been significantly improved, driving the explosive growth of the application market. According to IDC data, the global generative AI market is expected to have a compound annual growth rate of 63.8% in the next five years, reaching $284.2 billion by 2028, accounting for 35% of total AI investment. Zhixiang Future has benefited from this trend due to its strong technical capabilities and industrial implementation mindset. The company's commercialization process has been rapid. 36Kr learned that Zhixiang Future's annual revenue in 2025 exceeded 100 million yuan.
To achieve such results quickly in the highly competitive multi-modal generation field is due to Zhixiang Future's unique business model thinking and strong underlying innovation capabilities. It can be said that Zhixiang Future is one of the few companies in the industry that focuses on both commercialization and technological innovation.
In the three years since its establishment, Zhixiang Future has experienced different business models. In 2023, the model was MaaS (Model as a Service), selling models and APIs, similar to the PaaS model in cloud computing. In 2024, the model was SaaS (Software as a Service), mainly selling tools for users to create content on Zhixiang Future's platform.
Now, it has upgraded its model to RaaS (Results as a Service), a business model that delivers results and is user-value-oriented. It includes tools, content materials, and limited video production/placement with only a small basic fee, mainly earning commissions from the increased GMV of customers. According to Mei Tao, he believes that the customer value in this model is relatively clear, allowing for almost zero-risk investment and shared incremental benefits.
As the startup progresses smoothly, Mei Tao also said that he has found the balance between business returns and ability improvement. On one hand, he is continuously increasing investment in research on vertical basic models. A more powerful and advanced underlying architecture will surely lay a better foundation for the model's capabilities. In addition to independent research and development, Zhixiang Future also embraces a broader ecosystem through open-source, increasing the possibility of success. On the other hand, it still focuses on solving the last-mile problem, delving into the actual scenario needs of users, integrating more vertical data in industries such as education, e-commerce, and cultural tourism, and conducting fine-tuning to truly solve industry problems.
Intelligent Emergence also learned that Zhixiang Future is currently developing a new generation of multi-modal generation architecture with multi-modal reasoning drive and infinite memory. This will significantly improve the model's reasoning ability and achieve a higher level of horizontal scaling across multiple tasks.
Now, with the resonance of technology, market, and policy, the industry is realizing that AI video is no longer just a toy for geeks but a productivity tool that can directly generate cash flow. Since last year, popular AIGC videos such as "Cat and Dog Sports Meeting" and "Cutting Glass-like Fruits with a Knife" have become viral on social platforms, attracting more and more creators. It has become a common choice for both leading players and ordinary C-end users, ultimately accelerating the commercialization process of the video generation market.
According to data from the international research institution Fortune Business Insights, the global scale of AI video generation was approximately $620 million in 2024 and is expected to reach $2.56 billion by 2032, with a compound growth rate of 20% from 2025 to 2032.
Currently, AIGC has become the mainstream choice in marketing and specific content fields. A more promising prospect is that when the model can stably solve the problems of character consistency and long-term coherence, AIGC will detonate the market in high-end applications such as film and television and games. When the model breaks through the problem of consistency between understanding and generation, it will truly understand the physical world and generate more realistic and controllable content and details. That will be the real explosion moment for the video generation market. In this race, Zhixiang Future is leading the way.