Yang Hongxia, Ex - Leader of Large Model Projects at Alibaba and ByteDance, Starts Entrepreneurship: Large Model Pre - training Isn't a Computing Power Competition Among a Few Top Players

Yang Hongxia is going to embark on a completely different path of model training from Alibaba and ByteDance.

Interviewers | Zhou Xinyu, Deng Yongyi

Writer | Zhou Xinyu

Editor | Su Jianxun

Yang Hongxia, who has worked on large models at Alibaba and ByteDance for nearly seven years, has a distinct spirit of challenge.

In the early days at Alibaba, she shifted from the internal business of Alibaba's search and recommendation system to the research of large models, which was not favored at the beginning.

Later, she led core talents in China's large - model field, such as Lin Junyang (current head of Tongyi Qianwen) and Zhou Chang (former head of Tongyi Qianwen's large model), and developed the predecessor of Tongyi Qianwen, the M6 large model, at DAMO Academy.

In July 2024, after Yang Hongxia left ByteDance to start her own business, it was reported that she would still work on model - related technologies.

The halo of "core figure in Alibaba and ByteDance's large - model field" did not cover up the pessimistic voices in the market at that time: It's too late to enter the game. How can a startup compete with big companies?

One year and three months later, Yang Hongxia returned to the large - model track with her new AI company, InfiX.ai.

At the beginning of October, "Intelligent Emergence" had an online exchange with Yang Hongxia, who was in Hong Kong, about her recent entrepreneurial situation.

Instead of choosing Beijing, Shanghai, Guangzhou, or Shenzhen, where entrepreneurial resources are concentrated, she joined the Hong Kong Polytechnic University. In Yang Hongxia's view, going to Hong Kong is a very cost - effective decision:

There are generous financial and computing - power subsidies for industry - university - research projects related to artificial intelligence. Because Hong Kong has a globally leading talent density, the company was able to quickly assemble a team of 40 people.

For various reasons, Yang Hongxia hopes to only discuss technology in the interview and not disclose details of commercialization.

But just from the technology, we can glimpse the grand blueprint of InfiX.ai: This time, Yang Hongxia not only wants to compete with the top models in the market but also wants to revolutionize the training and implementation paradigms of large models.

Currently, the mainstream top models, including GPT, are "centralized" and dominated by large institutions. Yang Hongxia explained, "(Centralized models) require a large concentration of data, human resources, and computing power."

But what InfiX.ai wants to do is exactly the opposite: Decentralize the pre - training of large models so that small and medium - sized enterprises, research institutions, and even individuals can participate.

The core reason for this is that in mid - 2023, when Yang Hongxia was still at ByteDance, she found that the "centralized" models, which are good at solving problems in general fields, cannot be truly implemented.

For example, many data - sensitive enterprises have the need to deploy models locally. Generally, the mainstream solution in the industry is to conduct post - training (such as fine - tuning and reinforcement learning) on "centralized" models based on the enterprise's data.

However, Yang Hongxia emphasized to us: "The injection of model knowledge only occurs in the pre - training stage. Post - training provides rules." Just like pre - training is an eight - year medical doctorate, and post - training is the process of clinical internship.

This leads to the fact that models post - trained based on enterprise data still have many "hallucinations" in actual business.

Yang Hongxia's experience in developing "centralized" models led to two original judgments for her entrepreneurship:

First, for large models to be implemented, they cannot rely only on a few giant institutions. They must be pre - trained based on the data of many enterprises.

Second, to enable enterprises to conduct pre - training, the resources consumed must be reduced.

Based on these two judgments, recently, InfiX.ai open - sourced the world's first FP8 training "full package" (including pre - training, supervised fine - tuning, and reinforcement learning), a model fusion technology, and a medical multi - modal large model and multi - agent system trained based on this.

Low - bit model training framework InfiR2 FP8:

Compared with the commonly used computing precision FP16/BF16 in the industry, InfiR2 FP8 improves the training speed and saves video memory consumption while almost maintaining the model performance.

△ Performance comparison of InfiR2 - 1.5B - FP8 and the BF16 baseline on the inference evaluation set. The two are almost the same. Source: Provided by the enterprise

△ Test results of video memory usage, computational latency, and system throughput. Compared with FP16/BF16, InfiR2 FP8 increases the end - to - end training speed by up to 22%, saves up to 14% of the peak video memory, and increases the end - to - end throughput by up to 19%. Source: Provided by the enterprise

Model fusion technology InfiFusion:

Different - sized and different - structured domain "expert models" pre - trained by enterprises and institutions in different fields can create large models that integrate knowledge from different fields through model fusion.

This technology can avoid the waste of resources caused by repeated model training.

Medical multi - modal large model training framework InfiMed:

This framework enables small - parameter models trained based on small - scale data and computing power to demonstrate strong reasoning ability in multiple medical tasks.

△ Performance comparison of InfiMed - RL - 3B on 7 benchmarks. For example, the medical model InfiMed - RL - 3B, trained based on 36K RLV (reinforcement learning with verifiable rewards) small - scale data, significantly outperforms Google's medical model MedGemma - 4B - IT of the same size in the industry in seven major medical benchmark tests. Source: Provided by the enterprise

Multi - agent system InfiAgent:

This system can replace humans and automatically decompose and assign complex tasks to multiple agents, realizing automatic task planning and scheduling, and reducing the development threshold and cost of the agent system.

△ Test results of InfiAgent on multiple standard baselines. In complex tasks that require multi - step reasoning (such as DROP), InfiAgent leads the best baseline by 3.6%. Source: Provided by the enterprise

Furthermore, by using model fusion technology, these models injected with domain knowledge can become more powerful models - large models with more knowledge can be obtained without repeated pre - training.

In this entrepreneurial venture, Yang Hongxia not only focuses on the difficult - to - tackle medical field but also narrows the scope to the most challenging cancer field.

She told "Intelligent Emergence": "We must choose some particularly challenging fields to make the model's capabilities truly distinguishable and prove that our model is the best in this field."

In the early days of Yang Hongxia's entrepreneurship, "decentralization" and "model fusion" were still marginal narratives in the domestic model track, which still believed in "brute - force to achieve miracles." She remembered that at that time, she still had to explain a lot to partners and investors.

But in the United States, the wave of "decentralization" has gradually emerged. In February 2025, former OpenAI CTO Mira Murati founded a new company, Thinking Machines Lab (hereinafter referred to as "TML"), with the vision of making model training affordable for individual developers and startups.

"I really didn't expect that a company without actual business implementation could raise $2 billion in seed financing and have a valuation of $12 billion just by announcing that it would do this."

This news made Yang Hongxia certain that "decentralization" will become mainstream. "You can imagine how determined the people in the Bay Area are about this."

By the second round of financing, she found that the skeptical voices had significantly decreased. It only took InfiX.ai two weeks from proposing a capital increase to completing the financing. Yang Hongxia told us that the company has now over - raised.

In the picture she painted, in the future, every company and institution will have its own expert large model. Not only can professional models in different fields be fused, but models trained in China and Europe can also be fused with cross - border knowledge - model fusion will bring about a global domain - based large model.

"Artificial general intelligence (AGI) should not be a computing - power competition limited to top players," Yang Hongxia summarized. "In the future, it will be a 'national collaboration'."

Here is the conversation between "Intelligent Emergence" and Yang Hongxia, slightly edited for content:

"Centralization" brings technological breakthroughs, "decentralization" brings implementation

Intelligent Emergence: Briefly explain why we need "decentralized" model training?

Yang Hongxia: I see a big gap in the implementation of current models. When we talk to high - end fields, small and medium - sized enterprises, hospitals, and government agencies, everyone wants to use generative artificial intelligence, but they can't. The core reason is that current centralized large models don't have the corresponding domain data they need.

It should be emphasized that the injection of model knowledge only occurs in the pre - training stage. In the post - training stage, the model only receives rules to tell it how to solve complex tasks.

Therefore, for the local deployment of models in enterprises or institutions, continuous pre - training must be initiated because a large amount of local private data and knowledge of hospitals, enterprises, and institutions cannot be obtained on the Internet.

At the same time, it is difficult for different enterprises or institutions to share their data, which makes it impossible for models in the existing paradigm to cover the whole world and all industries.

I believe that in the future, every company will need a large model as a framework. So the first thing is that we hope to make this framework the cheapest, easiest to use, and have the lowest entry threshold, so that every enterprise or institution can have its own locally deployed model.

The second thing is that we want to make the model in a certain domain global through model fusion. For example, by fusing the medical specialty models of different hospitals, we can get a basic model in the medical field.

So the so - called "decentralization" means that in each field, we pool everyone's capabilities to build a large domain model together.

Intelligent Emergence: You previously did "centralized" model training at ByteDance and Alibaba. When did you start to pay attention to "decentralization"?

Yang Hongxia: We started to have this idea in mid - 2023.

At that time in the industry, let me give a simple example. When your scenario has a very large traffic, such as search, recommendation, and advertising, you can't always call a centralized 1.6 - trillion - parameter large model. You simply can't handle the service throughput pressure.

At the end of 2021, Google's CEO announced that Google would replace all its search engines with the BERT (a large model released by Google in 2018) base, which was unprecedented. At that time, the largest model of BERT, BERT - Large, only had 340 million parameters. So the industrial community can't call models with hundreds of billions of parameters every moment in high - traffic situations.

Since mid - 2023, we have made many attempts and proved that in a vertical field, small - sized models with 3 billion, 7 billion, or 13 billion parameters can perform better than a centralized large model with 1.6 trillion parameters.

By mid - 2024, we verified that this conclusion must be correct. It is definitely the future trend for domain models to become smaller.

Intelligent Emergence: In mid - 2023, you were still at ByteDance. At that time, did the industry, including ByteDance, have a consensus on "decentralized" model training?

Yang Hongxia: At that time, most people and large companies, including now, still focus on building centralized models to achieve artificial general intelligence (AGI).

Relatively speaking, there are fewer technical challenges on the centralized route. As long as you clean the data well enough, have enough money to hire people, build a robust and stable artificial - intelligence training infrastructure, and have enough computing power, you can definitely improve the model's capabilities.

Everyone has different missions. Large companies definitely hope to break through in artificial general intelligence (AGI), which is also something I really want to see.

But even today, the number of people who can really do core R & D of large models in large companies is still very small. A large number of people are still doing data cleaning, not to mention non - large - company institutions.

Experts in various fields, such as doctors, are actually very interested in large models. But when they directly call the API services of any open - source model, the results are not good, and there are all hallucinations.

Intelligent Emergence: When you were at Alibaba and ByteDance, did you believe in "centralization"? This is completely different from what you are doing now, which is "decentralization."

Yang Hongxia: I definitely believed in it, and I still do now.

Because centralization aggregates all resources, it reduces some technical challenges and will definitely bring about major technological breakthroughs.

But decentralization will definitely make technology widely used in various fields. So I think both paths are correct.

Intelligent Emergence: In mid - 2024, what progress made you think that the decentralized technology was correct?

Yang Hongxia: In early 2024, we had already verified that in a vertical field, small models can outperform large models. But few people noticed this at that time, and now it has become a consensus. For example, MIT Tech Review listed small language models as one of the top ten breakthrough technologies in 2025.

Once you verify this, you will naturally think that by directly fusing models in different fields instead of retraining, you can get a large model with more knowledge.

Around that time, Llion Jones, the founder of Sakana AI and one of the authors of Transformer, had already achieved some work. Their team has a luxurious lineup

This article is originally produced by「阿菜cabbage」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

Yang Hongxia, former leader of large model projects at Alibaba and ByteDance, embarks on entrepreneurship: Large model pre-training is not a computing power competition among a few top players | Exclusive from Intelligence Emergence

"Centralization" brings technological breakthroughs, "decentralization" brings implementation