Yang Hongxia, die ehemalige Leiterin von Großen Modellen bei Alibaba und ByteDance, gründet ein Startup: Das Pre-Training von Großen Modellen ist kein Rechenleistungskampf für nur wenige Spitzenakteure | Exklusiv von Intelligence Emergence
Interview | Zhou Xinyu, Deng Yongyi
Text | Zhou Xinyu
Editor | Su Jianxun
Yang Hongxia, who has been working on large models at Alibaba and ByteDance for nearly seven years, has a distinct spirit of challenge.
In the early days at Alibaba, she switched from the internal business of Alibaba's search and recommendation system to the research of large models, which was not favored at the beginning.
Later, she led a group of core talents in China's large - model field, including Lin Junyang (current head of Tongyi Qianwen) and Zhou Chang (former head of the large model of Tongyi Qianwen), and developed the predecessor of Tongyi Qianwen, the M6 large model, at the Dharma Academy.
In July 2024, after Yang Hongxia left ByteDance to start her own business, it was reported that she would still work on model - related technologies.
The halo of "core figure in Alibaba and ByteDance's large - model field" did not drown out the pessimistic voices in the market at that time: It's too late to enter the game. How can a startup compete with big companies?
One year and three months later, Yang Hongxia returned to the large - model track with her new AI company, InfiX.ai.
At the beginning of October, "Intelligent Emergence" had an online exchange with Yang Hongxia, who is in Hong Kong, about her recent entrepreneurial situation.
Instead of choosing Beijing, Shanghai, Guangzhou, or Shenzhen, where entrepreneurial resources are concentrated, she joined the Hong Kong Polytechnic University. In Yang Hongxia's view, going to Hong Kong is a very cost - effective decision:
There are generous financial and computing - power subsidies for industry - university - research projects related to artificial intelligence. Thanks to Hong Kong's globally leading talent density, the company was able to quickly assemble a team of 40 people.
For various reasons, Yang Hongxia hopes to only discuss technology in the interview and not disclose details about commercialization.
But just from the technology, we can glimpse the grand blueprint of InfiX.ai: This time, Yang Hongxia not only wants to compete with the top models in the market but also wants to revolutionize the training and implementation paradigms of large models.
Currently, the mainstream top models, including GPT, are "centralized" and led by large institutions. Yang Hongxia explained, "(Centralized models) require a large concentration of data, human resources, and computing power."
But what InfiX.ai wants to do is exactly the opposite: Decentralize the pre - training of large models so that small and medium - sized enterprises, research institutions, and even individuals can participate.
The core reason for this is that in mid - 2023, when Yang Hongxia was still at ByteDance, she found that the "centralized" models, which are good at solving problems in the general domain, cannot be truly implemented.
For example, many data - sensitive enterprises have the need to deploy models locally. Generally, the mainstream solution in the industry is to conduct post - training (such as fine - tuning and reinforcement learning) on the "centralized" model based on the enterprise's data.
However, Yang Hongxia emphasized to us: "The injection of model knowledge only occurs in the pre - training stage. Post - training provides rules." Just like a pre - training period is like an eight - year medical doctorate, and post - training is like a clinical internship.
This leads to the fact that models post - trained based on enterprise data still have many "hallucinations" in actual business.
Yang Hongxia's experience in developing "centralized" models led to two original judgments for her entrepreneurship:
First, for large models to be implemented, they cannot rely only on a few giant institutions. They must be pre - trained based on the data of many enterprises.
Second, to enable enterprises to conduct pre - training, the resources consumed must be reduced.
Based on these two judgments, recently, InfiX.ai open - sourced the world's first FP8 training "full package" (including pre - training, supervised fine - tuning, and reinforcement learning), a model fusion technology, and a medical multi - modal large model and a multi - agent system trained based on this.
Low - bit model training framework InfiR2 FP8:
Compared with the commonly used computing precision FP16/BF16 in the industry, InfiR2 FP8 improves the training speed and saves video - memory consumption while hardly sacrificing model performance.
△ Performance comparison of InfiR2 - 1.5B - FP8 and the BF16 baseline on the inference evaluation set. The two are almost the same. Source: Provided by the enterprise
△ Test results of video - memory usage, computing latency, and system throughput. Compared with FP16/BF16, InfiR2 FP8 increases the end - to - end training speed by up to 22%, saves up to 14% of the peak video - memory usage, and increases the end - to - end throughput by up to 19%. Source: Provided by the enterprise
- Model fusion technology InfiFusion:
Different - sized and different - structured domain "expert models" pre - trained by enterprises and institutions in different fields can create a large model integrating knowledge from different fields through model fusion (Model Fusion).
This technology can avoid the waste of resources caused by repeated model training.
- Medical multi - modal large - model training framework InfiMed:
This framework enables small - parameter models trained based on small - scale data and computing resources to show strong reasoning ability in multiple medical tasks.
△ Performance comparison of InfiMed - RL - 3B on 7 benchmarks. For example, the medical model InfiMed - RL - 3B, trained based on 36K RLV (reinforcement learning with verifiable rewards) small - scale data, significantly outperforms Google's medical model MedGemma - 4B - IT of the same size in seven major medical benchmark tests. Source: Provided by the enterprise
- Multi - agent system InfiAgent:
This system can automatically decompose and assign complex tasks to multiple agents instead of humans, realizing automatic task planning and scheduling, and reducing the development threshold and cost of the Agent system.
△ Test results of InfiAgent on multiple standard baselines. In complex tasks that require multi - step reasoning (such as DROP), InfiAgent leads the best baseline by 3.6%. Source: Provided by the enterprise
Furthermore, by model fusion technology, these models injected with domain knowledge can become more powerful models. Without repeated pre - training, we can get a large model with more knowledge.
In this entrepreneurship, Yang Hongxia not only focuses on the medical field, which is a hard nut to crack, but also narrows the scope to the most difficult - to - conquer cancer field.
She told "Intelligent Emergence": "We must choose some very challenging fields to make the model's capabilities truly distinguishable and prove that our model is the best in this field."
In the early stage of Yang Hongxia's entrepreneurship, "decentralization" and "model fusion" were still marginal narratives in the domestic model track, which still believed in "brute - force leads to miracles." She remembered that at that time, she still had to explain a lot to partners and investors.
But in the United States, the wave of "decentralization" has gradually risen. In February 2025, former OpenAI CTO Mira Murati founded a new company, Thinking Machines Lab (hereinafter referred to as "TML"), with the vision of enabling individual developers and startups to afford model training.
"I really didn't expect that a company without actual business implementation could raise $2 billion in seed financing and be valued at $12 billion just by announcing that it would do this."
This news made Yang Hongxia sure that "decentralization" would become mainstream. "You can imagine how determined the people in the Bay Area are about this."
By the second round of financing, she found that the skeptical voices had significantly decreased. It only took InfiX.ai two weeks from proposing a capital increase to completing the financing. Yang Hongxia told us that the company has now over - raised.
In the picture she painted, in the future, every company and institution will have its own expert large model. Not only can professional models in different fields be fused, but models trained in China and Europe can also be fused with cross - border knowledge. Model fusion will bring about a global domain - based large model.
"Artificial general intelligence (AGI) should not be a computing - power competition limited to top players," Yang Hongxia summarized. "The future will be a 'national collaboration'."
The following is the dialogue between "Intelligent Emergence" and Yang Hongxia, slightly edited for content:
"Centralization" brings technological breakthroughs, "decentralization" brings implementation
Intelligent Emergence: Briefly explain why we need "decentralized" model training?
Yang Hongxia: I see a big gap in the implementation of models today. When we talked to high - end fields, small and medium - sized enterprises, hospitals, and government agencies, everyone wants to use generative artificial intelligence, but they can't use it for a long time. The core reason is that the current centralized large models don't have the corresponding domain data they need.
It should be emphasized that the injection of model knowledge only occurs in the pre - training stage. In the post - training stage, the model only receives rules to tell it how to solve complex tasks.
So, for the local deployment of models in enterprises or institutions, continuous pre - training must be started because a large amount of local private data and knowledge of hospitals, enterprises, and institutions cannot be obtained on the Internet.
At the same time, it is very difficult for different enterprises or institutions to share their data, which makes it impossible for the existing model paradigm to achieve globalization and full - industry coverage.
I believe that in the future, every company will need a large model as a framework. So, the first thing is that we hope to make this framework the cheapest, most user - friendly, and have the lowest entry threshold, so that every enterprise or institution can have its own locally deployed model.
The second thing is that we want to globalize the models in a certain field through model fusion. For example, by fusing the medical specialty models of different hospitals, we can get a basic model in the medical field.
So, the so - called "decentralization" means that in each field, we gather everyone's capabilities to jointly develop large domain models.
Intelligent Emergence: You previously did "centralized" model training at ByteDance and Alibaba. When did you start to pay attention to "decentralization"?
Yang Hongxia: We started to have this idea in mid - 2023.
At that time, in the industry, let me give a simple example. When your scenario has a very large traffic, such as search, recommendation, and advertising, you can't always call a centralized large model with 1.6 trillion parameters. You simply can't handle the service throughput pressure.
At the end of 2021, Google's CEO announced that all of Google's search engines would be replaced with the BERT (a large model released by Google in 2018) base, which was unprecedented.
At that time, the largest model of BERT, BERT - Large, only had 340 million parameters. So, the industrial community cannot call a model with hundreds of billions of parameters every moment under extremely high - traffic conditions.
Since mid - 2023, we have made many attempts and proved that in a vertical field, small - sized models with 3 billion, 7 billion, or 13 billion parameters can perform better than a centralized large model with 1.6 trillion parameters.
By mid - 2024, we verified that this conclusion must be correct. It is definitely the future trend for domain models to become smaller.
Intelligent Emergence: In mid - 2023, you were still at ByteDance. At that time, did both ByteDance and the entire industry have a consensus on "decentralized" model training?
Yang Hongxia: At that time, more people and large companies, including now, still mainly focus on building centralized models to achieve artificial general intelligence (AGI).
Relatively speaking, there are far fewer technological challenges on the centralized path. As long as you process the data cleanly enough, have enough money to hire people, build a robust and stable artificial - intelligence training infrastructure, and have enough computing power, I can definitely improve the model's capabilities.
Everyone's missions are also different. Large companies definitely hope to break through in artificial general intelligence (AGI), which is also something I really want to see.
But even today, the number of people who can really do core R & D of large models in each large company is still very, very small. A large number of people are still doing data cleaning, not to mention non - large - company institutions.
Experts in various fields, such as doctors, are actually very interested in large models. But when they directly call the API services of any open - source model, the results are actually not good, and there are all hallucinations.
Intelligent Emergence: When you were at Alibaba and ByteDance, did you believe in "centralization"? This is completely different from what you are doing now, "decentralization."
Yang Hongxia: I definitely believed in it, and I still believe in it now.
Because centralization aggregates all resources, it reduces some technological challenges and will definitely bring about major technological breakthroughs.
But decentralization will definitely enable technology to be widely applied in various fields. So, I think both paths are correct.
Intelligent Emergence: In mid - 2024, what progress made you think that the decentralized technology was correct?
Yang Hongxia: In early 2024, we had already verified that in a vertical field, small models can outperform large models.
But at that time, few people noticed this. Now, it has become a consensus. For example, MIT Tech Review listed small language models as one of the top ten breakthrough technologies in 2025.
Once you verify this, you will naturally think of directly fusing models in different fields instead of retraining them