No more hiding! Tencent has released over 10 intelligent agents in one go and also open - sourced models in batches | Latest Frontline
Author | Deng Yongyi
Editor | Su Jianxun
In the past, Tencent has always maintained a low - key attitude towards large models. However, when it comes to AI applications, Tencent finally stops holding back this time.
On July 26th, the World Artificial Intelligence Conference (WAIC) opened in Shanghai. It's obvious that at WAIC, Tencent made AI agents become the “digital friends” of 1.4 billion users, and built a “friend circle” with agents, covering users' daily lives.
Tencent also released a number of new products from self - developed large models to agents, which can be said to be an “AI all - in - one package”, including:
- On the To B side: The new Hunyuan World Model 1.0 was released, which can be applied in VR, game development, scene editing, physical simulation, etc.; Next, several small - scale Hunyuan models will be open - sourced.
- On the To C side, more than 10 Agents were released, mainly focusing on life, study, and work scenarios, including the travel planning agent.
- At the platform level: An agent development platform, an embodied intelligence open platform Tairos, an AIGC content generation platform, an edge - side large model platform, and an AI education platform were released.
“The current AI is evolving from short - term memory to long - term memory,” said Wu Yunsheng, the vice - president of Tencent Cloud, the person in charge of Tencent Cloud Intelligence, and the person in charge of Tencent YouTu Laboratory. For a long time, large models could only remember relatively short contexts, which was not enough for complex tasks.
Tencent's heavy investment in agents is also an exploration of the technological evolution path. For example, multi - agent collaboration. Wu Yunsheng said that AI technology is evolving from text - image Q&A to all - around multi - modal (video, image, audio, etc.) interaction. Seamless interaction in all modalities will be essential in the future. Only when different agents are responsible for different specialties and collaborate with each other can more complex tasks be completed.
△Source: Tencent
In 2023, when Tencent Cloud just released the Hunyuan large model family, it was still telling the story of “industry - specific large models” — targeting 10 major industries such as finance, government affairs, and operators, and launched more than 50 solutions at once with the “industry - specific large model” approach.
But now, the narrative has changed, rapidly expanding from language models to multi - modal and embodied intelligence.
At this WAIC, Tencent also focused on releasing the progress related to embodied intelligence for the first time. Tencent's Robotics X Laboratory and Futian Laboratory jointly released the “Embodied Intelligence Open Platform Tairos”.
This is the first domestic embodied intelligence software platform that provides large models, development tools, and data services in a modular way. It is plug - and - play and open to the robotics industry, filling in the key software capabilities for robot body developers and application developers.
Focus on Both Models and To B/To C Applications
On the model side, the focus of Tencent's official release this time is the Hunyuan 3D World Model 1.0, and it was announced to be fully open - sourced.
If the technological evolution path of large language models (LLMs) has gradually become clear, from scaling up to the second half dominated by reinforcement learning; then the development stage of multi - modality is still in its early days, and there are many difficulties in technology selection, high - quality data, and model engineering.
Multi - modality is the focus of competition among companies this year, and the world model is an important branch of multi - modality, emerging in December 2024.
To put it simply, Tencent's Hunyuan 3D World Model 1.0 integrates panoramic vision generation and hierarchical 3D reconstruction technology, supports both text and image input, and realizes the generation of high - quality, diverse - style, and navigable 3D scenes.
△Source: Tencent
In the past, 3D modeling and rendering were huge projects that required professional modeling teams several weeks to complete. Now, with just a piece of text or an image, it can be generated within a few minutes.
When talking about the difficulties in training the world model, data is one of them. Guo Chunchao, the person in charge of Tencent's Hunyuan 3D, said in an interview with media including 36Kr that currently, 3D assets mainly rely on manual production by artists or modelers, so the number is only in the tens of millions. Compared with the billions or tens of billions of images, there is an order - of - magnitude gap. The difficulty of obtaining such data objectively exists.
Regarding the future development focus, Guo Chunchao said that the goals of the Hunyuan World Model are two - fold: First, to improve the quality of 3D asset generation to a higher commercial level. Currently, 3D asset generation has reached an intermediate level, but there is still a gap compared with the top - level. By improving the generation quality and generalization ability, they hope to better meet the needs of industries such as gaming, autonomous driving, XR, animation, and film and television, reduce costs, and shorten the cycle.
Second, to improve the scene generation and interaction model, and build a more complete world model that truly simulates physical laws. This will be the focus this year and reach a higher level of maturity next year.
At the beginning of this year, the experience of DeepSeek R1 has proven that in a brand - new technological field, seizing the technological initiative and doing respectable work can bring huge market returns.
After that, companies have accelerated their open - source pace. In fact, in addition to the Hunyuan 3D World Model 1.0, which is open - sourced upon release, Hunyuan will also open - source a series of small - size models at the end of the month, including 0.5B, 1.8B, 4B, and 7B hybrid inference models, which are lighter and easier to deploy.
Due to its accumulation in content fields such as gaming and social media, Tencent is already among the first - tier players in China in the exploration of multi - modality. Now, Tencent has provided an open - source base close to the performance of commercial models to the public, making it convenient for the community to customize based on business and usage scenarios.
According to the public data released by Tencent, the number of Tencent's image and video derivative models has reached 1400 and 1600 respectively. The community download volume of the Hunyuan 3D series models has exceeded 2.3 million, making it the most popular 3D open - source model globally.
In addition to the world model, Tencent Hunyuan also disclosed a series of open - source plans, including edge - side hybrid inference language models, multi - modal understanding models, and game vision models.
For example, the upcoming open - source Hunyuan - large - vision is a multi - modal understanding model that ranked first in China on the LMArena Vision list. The interactive game video generation framework “Hunyuan GameCraft”, which is optimized for game scenarios, will also be open - sourced to the public in the near future.
Implementation, Always the Key
In its large - model strategy, Tencent has always been practical. At this WAIC, Tencent's theme is “Make ‘useful AI’ a universal productivity”.
Tencent has embedded the capabilities of agents into multiple To B and To C applications, covering scenarios such as life, work, study, and entertainment.
In the study scenario, QQ Browser's QBot provides functions such as AI search, AI browsing, AI office work, AI learning, and AI writing. The ima AI Workbench can assist in completing daily study and work tasks and accumulate them into a personal intelligent knowledge base for a long time. It also supports joining others' shared knowledge bases for precise Q&A.
Another example is the travel planning agent. It can generate a travel guide with one click according to the visitor's needs and allows for personalized editing of the generated guide at any time. At the same time, seamless ordering can be directly realized through the built - in mini - program, truly achieving multiple instructions with one input.
△Source: Tencent
In terms of entertainment creation, QQ Music has also launched the “AI songwriting” and “AI singing” functions to help users create or “sing” songs with high quality. Previously, QQ Music launched the world's first AI singer, “AI Li Hong”.
Tencent not only develops agents itself but also provides supporting “creation tools” for agents. For example, the “Tencent Cloud Agent Development Platform” and the “Tencent Yuanqi” two major agent development platforms can greatly lower the threshold for building and using AI agents and help enterprise customers and creators build their own agents.
Previously, similar to the private cloud deployment of enterprises, industry - specific large models have a high degree of customization, and there have always been questions in the market about “high implementation cost” and “difficult implementation”. After the capabilities of large models have been continuously improved in the past two years, agents are now in the spotlight.
With the emergence of agents, what's the significance of industry - specific large models? Wu Yunsheng told 36Kr that agents and industry - specific large models are more complementary. For industry - specific large models, enterprise customers can co - create with Tencent Cloud, depositing industry know - how into the industry - specific large models, and these capabilities can be reused; while agents can solve smaller - scale problems in the front - end scenarios of enterprise customers through protocols such as MCP.
“Agents can magnify the value of large models and are an important form to solve the implementation problems in the industry,” Wu Yunsheng told 36Kr.