Iterate four times a year, making the crazy addition of GPUs a reality. Microsoft's AI uses liquid cooling, with heat dissipation tripling.
Everyone has been worried: As AI becomes more and more costly, will the subscription fee for ChatGPT increase year by year?
What's even more frustrating is that using AI has become increasingly prone to glitches and slowdowns.
The real culprit is actually overheating chips. As the new generation of AI chips drastically boost computing power, traditional air cooling and cold plates are struggling to keep up.
Just recently, Microsoft CEO Satya Nadella spoke out on X:
We are reimagining the way to cool chips to make future AI infrastructure more efficient and sustainable.
Behind this statement is a "black technology" just announced by Microsoft - sending coolant directly into the tiny channels inside the chip, which can increase the heat dissipation efficiency by up to three times.
Its emergence may be the key to breaking the "thermal bottleneck".
Coolant Inside the Chip: Microsoft's Liquid Vessel Solution
AI is getting hotter and hotter.
As the model scale expands and the demand for computing power skyrockets, the chips behind it are like engines running at high temperatures.
Air cooling, liquid cooling, and cold plates, which used to be able to handle the heat, are now approaching their limits.
In a recently announced experiment, Microsoft directly sent coolant into the chip, etching grooves thinner than a human hair on the back of the silicon wafer, allowing the liquid to flow like blood vessels and carry away heat from the source.
This "microfluidic cooling", which sounds like science fiction, has quite astonishing experimental results.
According to the data disclosed by Microsoft, its heat dissipation efficiency is up to three times better than that of cold plates, and the temperature rise inside the GPU can be reduced by 65%.
For engineers, this means that the same hardware can handle a larger load without being forced to reduce the frequency or even crash due to overheating.
Reaching this stage was not easy.
The Microsoft team conducted four consecutive rounds of iterations in a year before finding a microchannel design that would not clog and could ensure the strength of the chip.
Husam Alissa, the system technology director of Microsoft Cloud Operations and Innovation, said bluntly:
Developing microfluidic cooling requires a systematic understanding of the interaction between the silicon wafer, coolant, server, and the entire data center.
Husam Alissa, System Technology Director of Microsoft Cloud Operations and Innovation.
To make the coolant cover the hot spots on the chip more precisely, researchers designed a bionic structure with the help of AI, which branches like a leaf vein and is much more efficient than straight channels.
Microsoft also collaborated with Swiss startup Corintis to solve a series of engineering problems such as etching, packaging, and leakage prevention.
The data is impressive, but it ultimately needs to be verified by real loads.
Microsoft chose its own Teams as the test scenario.
At every hour or half - hour, the number of meetings surges, and the servers are suddenly under high load. With traditional cooling solutions, you either have to add more idle hardware or risk letting the chips run at high temperatures for a long time.
Microfluidic cooling makes another possibility emerge: safely "overclock" at critical moments to handle the peak demand.
Jim Kleewein, a Microsoft technical researcher, summarized:
Microfluidic cooling improves almost all key indicators, including cost, reliability, speed, and sustainability.
When the coolant truly flows into the "blood vessels" of the chip, AI will have new confidence for its next - stage expansion.
Can AI Withstand the Increasingly Hot Chips?
AI's "fever" is not a metaphor but a real physical phenomenon.
Each generation of computing power chips is increasing power consumption, from a few hundred watts to over a thousand watts, and the heat is piling up like a snowball.
In the past, data centers could rely on air cooling and cold plates, but under today's AI peak loads, these technologies are gradually falling short.
As Sashi Majety, the project leader at Microsoft, warned:
Within five years, if you still mainly rely on cold plate technology, you'll be stuck.
The reason behind this is not hard to understand.
Taking the overall energy consumption of data centers as an example, the International Energy Agency (IEA) predicted in a report that to meet the electricity demand of global data centers, the power supply will increase from about 460 TWh in 2024 to over 1000 TWh by 2030.
That is to say, the overall power demand of data centers may double in about six years.
Looking at the data in the United States, according to a report by the Congressional Research Service (CRS), the electricity consumption of US data centers in 2023 was about 176 TWh, accounting for 4.4% of the total US electricity consumption that year.
If it continues to expand according to the trend in the next few years, the cooling system supporting such a large - scale operation will also account for a huge portion of the infrastructure energy consumption budget.
The problem with cold plate cooling lies in thermal resistance and conduction loss.
The thermal resistance in the chip packaging layer and interface materials greatly reduces the efficiency of heat transfer.
As the power density of the chip increases, the obstruction of these "intermediate layers" becomes more serious, and heat is often trapped inside the chip.
As a result, to avoid frequency reduction or damage due to overheating, manufacturers have to leave sufficient margins in the design or limit performance output.
Even worse, to make the cold plates work better, data centers have to lower the temperature of the coolant.
The energy consumption required for refrigeration is a non - negligible expense. For example, in some large data centers, the power consumption of the refrigeration system accounts for a significant proportion of the overall energy consumption.
At this scale, the importance of efficiency is magnified to the extreme.
Rani Borkar, the vice - president of hardware systems and infrastructure at Microsoft Azure Data Center, said bluntly in an interview:
When you reach that scale, efficiency is very important.
This statement reflects the voice of the entire industry: Whoever can improve efficiency first will take the initiative in the next computing power cycle.
From Saving Money to Eliminating Glitches: The Real Impact of Microsoft's Cooling Technology
It may sound like microfluidic cooling is just a game for engineers, but it actually affects everyone's experience of using AI.
The Saved Electricity Bill Could Be the Membership Fee
Training and running large models are inherently money - burning games.
Running AI models, especially real - time inference in the cloud, essentially consumes electricity.
An industry study compared the inference energy consumption of large language models of different scales, revealing that the larger the model and the more frequent the inference, the more significant the consumption.
If the heat dissipation efficiency cannot be improved, data centers can only add cooling systems or run at a reduced frequency, and these costs will ultimately be reflected in the product pricing.
Microsoft's internal press release also mentioned that microfluidic cooling can reduce the temperature rise inside the chip by 65%, and the heat dissipation efficiency is up to three times higher than that of cold plates.
This means that in the same environment, performance can be maintained at a lower cost.
AI Can Be Greener Without Being a Power - Hungry Monster
The promotion of AI is accompanied by a huge demand for electricity.
An MIT report listed that the popularization of generative AI has put pressure on resources such as electricity and water in data centers.
Data centers were once compared to "energy - hungry monsters", and in some areas, their power demand can be equivalent to that of thousands of households.
If the cooling technology is more efficient, the energy consumption of the refrigeration system may not account for such a large proportion, thereby reducing the total energy consumption and carbon emissions.
Interestingly, Microsoft's experiment found that even when the coolant temperature is as high as 70°C, microfluidic cooling can still work efficiently.
This means that it doesn't need to cool the coolant to extremely low temperatures like traditional cooling solutions, saving a large amount of energy from the source.
For enterprises, this is an ESG label, and for users, it means less environmental burden every time they use AI.
From Queuing to Instant Image Generation: The Experience Upgrade Behind the Coolant
You must have encountered situations like this: a video conference suddenly glitches, AI image generation is as slow as a snail, or model inference is half a beat behind.
Part of these problems stems from the fact that the chips are forced to reduce the frequency or delay processing after overheating.
Microsoft chose its own Teams as the experimental object in the test.
Interestingly, the traffic of Teams is not evenly distributed.
Most meetings start at the hour or half - hour, and as a result, in the few minutes before and after that, the servers controlling the meetings will be suddenly "overwhelmed", and the load will soar.
With traditional cooling, you either have to add a lot of extra hardware to handle the short - term peak or risk letting the chips run at high temperatures for a long time.
Microfluidic cooling offers another possibility: safely "overclock" during these peak periods so that the same hardware can handle the sudden surge in demand.
For users, the most obvious change is that meetings are no longer glitchy, the response is faster, and there's no need to worry about failures at critical moments.
Microsoft's Strategy: It's Not Just About Cooling, but Leading the Way into the Future
Making the seemingly science - fiction technology of "sending coolant into the chip" a reality is already quite shocking.
But for Microsoft, this is just the first move in a larger game. Microsoft's real goal is to seize the future entrance of AI infrastructure.
Judging from its capital expenditure, its ambition is obvious.
Microsoft's financial report for the fourth quarter of fiscal year 2025 shows that the single - quarter capital expenditure reached $24.2 billion, and most of it was directly invested in cloud and AI infrastructure.
In addition, media reports said that Microsoft plans to invest more than $30 billion in the upcoming quarter in expanding cloud and AI infrastructure.
This is not just "throwing money around"; it's laying the foundation for the computing power landscape in the next twenty years.
So far, Microsoft has launched two self - developed chips, Cobalt 100 and Maia, for general - purpose computing and AI acceleration respectively.
If microfluidic cooling is a means to solve the "heat" problem, then self - developed chips are about taking control firmly in its own hands - reducing dependence on NVIDIA and enabling deep coupling between cooling, architecture, and software layers.
Microsoft has also not stopped in the network field.
For example, its research on supporting hollow - core fiber has reduced the transmission loss of optical signals to a record low (about 0.091 dB/km), which is regarded as a breakthrough in the fiber - optic field.
It may sound abstract to ordinary users, but in data centers, it means that server nodes can communicate faster with less energy consumption, and AI can respond more promptly.