StartseiteArtikel

Vier Iterationen pro Jahr, das wildes Stapeln von GPUs wird Realität. Microsoft bringt Kältemittel in die AI-Chips, die Wärmeableitung steigt um das Dreifache.

新智元2025-09-24 16:16
Wenn die Chips so heiß werden wie ein Ofen, steigen die Stromkosten sprunghaft, es kommt zu Latenzzeiten und Ruckeln sowie zu Ausfällen der Dienste. Alle imaginierbaren Zukunftsvisionen werden dann von der "Hitze" festgehalten. Das Trumpfblatt, das Microsoft diesmal auf den Tisch legt, ist die Schaffung von flüssigkeitsführenden "Gefäßen" in den Chips, damit das Kühlmittel direkt zum wärmeerzeugenden Kern gelangen kann. Ob die KI weiterhin rasant voranschreiten kann, ob die Preise niedrig gehalten werden können und ob die Benutzererfahrung reibungslos sein kann, hängt von dieser Kühlrevolution ab.

Everyone is worried: AI is becoming increasingly expensive. Will the subscription fees of ChatGPT increase annually?

What's even worse is the increasingly frequent latency and disruptions when using AI.

The real culprit is the overheating of chips. With the leap - like increase in the computing power of the new generation of AI chips, conventional air - cooling and cold plates can hardly keep up.

Just now, Microsoft CEO Satya Nadella expressed his view on X:

We are re - imagining the way of chip cooling to make future AI infrastructure more efficient and sustainable.

Behind these words lies a "Black Technology" that Microsoft has just announced: Directly introducing coolant into the tiny channels inside the chip, which can increase the cooling performance by up to three times.

Its emergence could be the key to breaking the "heat barrier".

Coolant in the chip: Microsoft's solution with liquid "vessels"

AI is getting "hotter" all the time.

With the expansion of the model size and the leap - like increase in computing power requirements, the underlying chips run like engines at high temperatures.

Air - cooling, liquid - cooling, and cold plates, the cooling methods that were sufficient before, have almost reached their limits today.

In a recently published experiment, Microsoft directly introduced coolant into the chip. Grooves thinner than a hair were etched on the back of the silicon wafer so that the liquid flows like blood in the vessels and dissipates heat directly at the source.

This "microfluidic cooling", which sounds like science fiction, has achieved amazing results.

According to the data released by Microsoft, its cooling performance is up to three times higher than that of cold plates, and the temperature rise inside the GPU can be reduced by 65%.

For engineers, this means that on the same hardware, a higher load can be handled without the hardware being down - clocked or even failing due to excessive temperatures.

It wasn't easy to get to this point.

The Microsoft team carried out four consecutive iterations in one year to develop the micro - channel design that neither clogs nor affects the chip's strength.

Husam Alissa, the director of system technology at Microsoft Cloud Operations and Innovation, said directly:

When developing microfluidic cooling, one must use systematic thinking to understand the interaction between the silicon wafer, coolant, server, and the entire data center.

Husam Alissa, the director of system technology at Microsoft Cloud Operations and Innovation.

To more precisely direct the coolant to the hot spots of the chip, researchers developed a bionic structure with the help of AI that branches like leaf veins and has much higher efficiency than straight channels.

Microsoft also collaborated with the Swiss startup Corintis to solve a series of engineering problems such as etching, encapsulation, and leakage prevention.

The data is impressive, but ultimately it has to be validated with real loads.

Microsoft chose its own Teams as the test environment.

The number of meetings increases leap - like every full hour or half - hour, and the servers are suddenly under high load. With conventional cooling solutions, one either has to provide more hardware or risk having the chips run at high temperatures for a long time.

Microfluidic cooling offers another possibility: In critical situations, the hardware can safely "overclock" and handle peak loads.

Jim Kleewein, a Microsoft technology researcher, summarized it as follows:

Microfluidic cooling improves costs, reliability, speed, sustainability... almost all important indicators.

If the coolant actually flows into the "vessels" of the chip, AI will gain new strengths for the next expansion.

Can AI still keep up as chips get hotter?

The "fever" of AI is not a metaphor but a real physical phenomenon.

Each generation of computing - power chips increases power consumption. From a few hundred watts to several thousand watts, and the heat accumulates like a snowball.

Previously, data centers could manage with air - cooling and cold plates, but under today's peak AI loads, these technologies are increasingly reaching their limits.

As Microsoft project leader Sashi Majety warned:

In five years, you'll be stuck if you still mainly rely on cold - plate technology.

The reason behind this is not hard to understand.

Take the total energy consumption of data centers as an example. The International Energy Agency (IEA) predicts in a report that the world's electricity demand to meet the requirements of data centers will increase from about 460 TWh in 2024 to over 1000 TWh in 2030.

This means that the total electricity demand of data centers could double in about six years.

Let's look at the data from the United States: According to a report by the Congressional Research Service (CRS), the electricity consumption of American data centers in 2023 was about 176 TWh, accounting for 4.4% of the total electricity consumption in the US that year.

If the expansion continues in the next few years, the cooling system for such a large infrastructure will also claim a significant share of the infrastructure's energy budget.

The problem with cold - plate cooling lies in thermal resistance and transfer losses.

The thermal resistance in the chip encapsulation layer and interface materials significantly reduces the efficiency of heat transfer.

As the power density of chips increases, the hindrance of these "intermediate layers" becomes stronger, and heat often gets trapped inside the chip.

The result is: To avoid having the chip down - clocked or damaged due to excessive temperature, manufacturers must plan sufficient reserves during the design phase or limit the power input.

Even worse, to make the cold plates work better, data centers have to further lower the temperature of the coolant.

The energy consumption for cooling is itself a cost factor that cannot be ignored. For example, the electricity consumption for the cooling system in some large data centers accounts for a significant share of the total energy consumption.

At this scale, the importance of efficiency is maximized.

Rani Borkar, the vice - president of hardware systems and infrastructure at the Microsoft Azure Data Center Division, said directly in an interview:

When you reach this scale, efficiency is very important.

These words hit the heart of the entire industry: Whoever can improve efficiency first will have the upper hand in the next computing - power cycle.

From cost - saving to trouble - free: The actual impacts of Microsoft's cooling technology

Microfluidic cooling sounds like a game for engineers, but it actually has an impact on our personal experiences when using AI.

The saved electricity costs could be the membership fees

Training and operating large models is in itself an expensive game.

Operating AI models, especially for real - time inference in the cloud, essentially consumes electricity.

An industry study compared the energy consumption of large - language - model inference of different sizes and showed that the larger the model and the more frequent the inference, the higher the energy consumption.

If the cooling efficiency is not improved, data centers either have to expand the cooling system or reduce the performance. These costs will ultimately be passed on to the product prices.

In an internal Microsoft press release, it was also mentioned that microfluidic cooling can reduce the temperature rise inside the chip by 65% and the cooling efficiency can be up to three times higher than that of cold plates.

This means that under the same circumstances, the performance can be maintained at a lower cost.

AI will no longer be a power - guzzler and can also be more environmentally friendly

The spread of AI is accompanied by a high electricity demand.

A report from MIT stated that the spread of generative AI puts pressure on resources such as electricity and water in data centers.

Data centers were once called "energy - guzzlers". In some regions, their electricity demand can be equivalent to that of thousands of households.

If the cooling technology is more efficient, the energy consumption of the cooling system may no longer be so large, which can reduce the total energy consumption and CO2 emissions.

Interestingly, the Microsoft experiment showed that microfluidic cooling still works efficiently even at a coolant temperature of 70 °C.

This means that it doesn't have to cool the coolant to a very low temperature like conventional cooling solutions, and thus saves a lot of energy from the start.

For companies, this is an ESG feature, and for users, it means that every time they use AI, there may be less environmental impact.

From waiting queues to instant images: The improvement of user experience behind the coolant

Surely you've experienced this: A video conference suddenly freezes, AI image generation is as slow as a turtle, or model inference is much slower.

Some of these problems stem from the fact that the chip is down - clocked after overheating or the processing is delayed.

During its test, Microsoft chose its own Teams as the experimental object.

Interestingly, the data traffic of Teams is not evenly distributed.

Most meetings start around...