HomeArticle

GMI Cloud: Going global is the best way for AI companies to release production capacity and gain new life | WISE 2025

邱晓芬2025-12-08 18:43
The core challenges for AI applications going global are the timeliness, scalability, and stability of model inference services.

From November 27th to 28th, the 36Kr WISE2025 Business King Conference, hailed as the "annual technology and business trendsetter," was held at the Conduction Space in the 798 Art District, Beijing.

This year's WISE is no longer a traditional industry summit but an immersive experience centered around "technology-themed hit short dramas."

From AI reshaping the boundaries of hardware to embodied intelligence opening the door to the real world; from brand globalization in the wave of going global to traditional industries equipping with "cyber prosthetics" - what we restore is not only trends but also the insights honed through numerous business practices.

In the following content, we will dissect the real logic behind these "hit dramas" frame by frame and witness the unique business landscape of 2025 together.

At this conference, Qian Yujing, the VP of Engineering at GMI Cloud, delivered a speech titled "Upgrading the Efficiency of AI Applications Going Global: Solving the Computing Power Dilemma and Evolving the Inference Architecture."

GMI Cloud is a North American AI Native Cloud service provider and one of the first six Reference Cloud Partners of NVIDIA.

Qian Yujing believes that for global users, the diversified development of AI applications has reached an all - encompassing state, and going global has become the best way for Chinese companies to release production capacity and gain new vitality.

Currently, China's AI going - global initiative is undergoing a paradigm shift - from the past one - way technology output to a transformation centered around the globalization of computing power, demand, and value. Behind this is a hidden global value resonance.

Qian Yujing

The following is the transcript of Qian Yujing's speech, edited by 36Kr

Good afternoon, everyone!

My name is Yujing. I'm the VP of Engineering at GMI Cloud, mainly in charge of all engineering projects. Today, I'm here to share with you how to upgrade the efficiency of AI applications going global, solve the computing power dilemma, and complete the evolution of the inference framework to bring greater efficiency to AI applications going global.

GMI Cloud is a relatively new company, so I'll take a little time to give you a brief introduction.

We are a company focusing on AI infrastructure for going global. We are one of the first six Reference Cloud Partners of NVIDIA, and our main focus is on our AI hardware and the upper - layer inference architecture.

Currently, GMI Cloud has three major product lines - the underlying computing hardware, cluster management, and inference services at the MaaS layer. We provide the necessary capabilities for various AI enterprise customers from three different dimensions.

We have built our own data centers in multiple locations around the world (East Asia, South Asia, North America, Europe, Canada). Recently, we invested 500 million US dollars to build an AI Factory with NVIDIA in Asia, which has a cluster of 3 million GB300 cards. In China, we mainly target enterprise users of AI going - global, focusing on the overseas market and helping everyone succeed in going global.

Now, let's get to the point. Apart from business model requirements, what going - global trends has GMI Cloud noticed in 2025?

At this moment, some people think there is a large bubble in AI, while others believe in AI and think that AI applications will experience exponential growth. From the perspective of a computing power provider or service provider, the trend we can observe is that the AI market is indeed growing exponentially.

Although different enterprises and analysts have different analyses of the market in the second half of 2025 or 2026, the overall direction is still upward. We can see that the monthly active users of Chinese overseas AI applications are still rising steadily this year.

Global users, especially those in North America, have developed the habit of actively embracing AI. The use of AI applications has become all - pervasive, and over 90% of knowledge workers in the United States are already very proficient in using AI tools.

As we all know, the domestic paid software market is highly homogeneous and has high customer acquisition costs. That is to say, the threshold for doing SaaS in China is very high.

However, in the Middle East and Latin America, there is a surprising statistic that the penetration of AI applications has reached a relatively high level. This means that the user education in the overseas market is basically completed, which creates a significant demand gap for us to go global. Therefore, going global is the best way to release production capacity and gain new vitality.

Of course, many domestic enterprises have also noticed this trend. In the past two years, many domestic enterprises have been exporting AI services overseas, which has led to an exponential surge in the demand for AI inference. This is something we, as a computing power provider, can clearly perceive.

We've summarized that during the process of AI going global, there are several core challenges related to inference, such as the timeliness, scalability, and stability of services.

We know that a characteristic of AI products is that sudden success can come unexpectedly. For AI going - global enterprises, it's often impossible to expand capacity in the same way as traditional software because all tokens require GPU support, especially for global expansion, which is a significant challenge.

In addition, another challenge is that the technological iteration of the entire AI technology stack is extremely fast. From January to May this year, due to the explosion of multi - node system inference, the token price dropped from a relatively high level to rock - bottom.

For enterprises, they often have to use their own resources to keep up with the technology. So they are troubled by how to keep pace with the current technological development?

As a provider, we've noticed these needs and challenges. What has GMI Cloud done this year?

First, as a computing power service provider, we need to build our own data centers. Now we are working with NVIDIA on a project called AI Factory, which was revealed by Jensen Huang in April. It will use the latest large - scale machines like GB200 and GB300 to greatly increase the cluster throughput. We are one of the few NCPs in the Asian region to initiate the AI Factory project first, and it's on a scale of a ten - thousand - card cluster.

Then, we continue to iterate our cluster engine and inference engine, which are the middle layer and the upper layer respectively. The target customer groups of these two engines are different. Our cluster engine is for customers with certain engineering and technical capabilities who want to develop relatively complex applications. The upper - layer inference engine is designed for enterprise customers who focus more on lightweight terminal applications.

Our Cluster Engine is actually quite similar to traditional clouds, but as an AI - native cloud, it focuses more on the computing power of GPUs.

Our Cluster Engine is a standard IaaS layer, covering the underlying hardware, bare - metal servers in the middle, and cluster management on top. We also provide a large number of monitoring plugins to offer a familiar experience.

Many going - global enterprises may be used to using large overseas clouds like GCP and AWS. We also support the GPU - related functions of these clouds. We have a specialized Infiniband networking technology that allows customers to choose the cluster size they want for training.

Moreover, many customers have their own private clusters, which often encounter expansion problems. Our Cluster Engine can perfectly solve this problem because we have integrated a multi - cloud architecture. Customers can switch between their own resources and those of traditional large clouds to meet their peak - time expansion and contraction needs.

Now, let's talk about our Inference Engine. The Inference Engine is a simpler product, which is related to the popular concept of Serverless.

Our Inference Engine integrates the world's leading large - scale models, whether open - source or closed - source, and they are all supported on our platform. You only need an API to access all the latest and most powerful models globally.

In addition, our GMI Cloud Inference Engine supports automatic expansion and contraction across clusters and regions. Why do we do this? It's closely related to the going - global demand. We've found that many customers train their own models, but when they go live, they can't handle the peak traffic. Also, when users from different regions access the service, the initial choice of cluster address can affect the overall product experience.

So, the Inference Engine 2.0 version is specifically designed for this scenario. We can help customers solve the problem of automatic expansion and contraction across regions and clusters.

Specifically, we've designed a three - layer architecture to schedule global resources. Basically, all workloads of the engine can be divided into two scheduling methods: one is queue - based, and the other is load - balancing - based.

The queue - based method is mainly suitable for popular models like video or voice models. The load - balancing - based method is mainly for well - known large - language models. We'll choose the scheduling method according to different workloads.

For example, is a workload more sensitive to latency or cost? Based on different options, we'll schedule the workload to different regions and then distribute the workflow to the terminal GPUs.

In short, the core architecture of our inference engine has five core features:

1. Global deployment. You can solve the problem of global service deployment with just one of our platforms.

2. We've solved the problem of the secondary scheduling architecture, which is also closely related to global deployment.

3. Elasticity. The biggest problem for all model companies and application companies going global is elastic scaling. Since the traffic of enterprises at the beginning has the nature of peaks and valleys, and the initial target customer groups and regions are limited, elasticity is a necessity.

4. High - availability design. We can ensure that customers' workloads can be accessed at any time.

5. Unified management of all workloads.

The above five features are provided based on the customer needs we've observed.

Similar to the cluster engine mentioned earlier, the GMI Cloud Inference Engine also supports hybrid cloud. Whether you want to build your own cluster, use GMI Cloud's cluster, or have credits or workloads on public clouds, you can manage them uniformly through our platform. You don't need to worry about resource fragmentation and utilization, as these have been considered in our top - level scheduling.

Here, I'd like to give a little advertisement. If you have the need to host your own models for going global, you can try our Inference Engine 2.0 product called Dedicated Endpoint, which is a standalone node.

You can try this product, choose which clusters and regions to deploy the nodes in, and select cheaper or more convenient nodes according to your needs.

Additionally, I'd like to give you a little preview. We're about to launch a product called "GMI Studio," a newly - created creative experience product.

With this product, we've upgraded the original console, which was more focused on model management and deployment, into a product for entrepreneurs and users. Through GMI Studio, users can freely combine the latest AI models and their applications in a "drag - and - drop" way on the cloud without a local environment or a complex inference framework.

Finally, let's envision 2026.

The paradigm upgrade of AI going global in 2026 is a process from the old paradigm - one - way technology output - to the new paradigm - global value resonance.

As the wave of AI going global heats up, the globalization of AI has been elevated to a new level. It breaks through the shallow understanding of "one - way technology output" and points to the underlying transformation of the global AI industry from "resource fragmentation" to "value circulation." It is no longer just a geographical expansion of AI applications but a "two - way empowerment ecosystem" formed by computing power, technology, and demand on a global scale.

At the computing power layer, global resources complement each other, and high - quality computing power accelerates model optimization. At the application layer, tokens have evolved from simple API call measurements to a composite value carrier for computing power settlement and ecological incentives. Global AI innovation co - exists, and models, applications, scenarios, and computing power are forming a new positive value cycle.