"Choose Google, fall behind by a year!" An American AI leader "crashes" Google: Save money but lose time. Is NVIDIA overjoyed?
The plot in the AI short drama has come true in reality.
Just as Google's annual developer conference was in progress, a leader in the AI industry mercilessly disrupted the event! What kind of grudge could it be?
During the Google I/O Conference a few days ago, David Holz, the founder and CEO of Midjourney, a well - known text - to - image AI company, publicly criticized Google's TPU without mercy, sparking extensive discussions in the Silicon Valley AI circle.
Holz said on the X platform that because Midjourney chose Google's TPU instead of NVIDIA's GPU as the core training infrastructure in the early years, the research progress of Midjourney was set back by a full year compared to what it could have achieved. "If I could go back in time, I would have completely adopted NVIDIA chips from the start."
This statement is so impactful because it comes from a well - known AI company that has a deep cooperation with Google. It was once a signboard for Google Cloud to promote its own chips, and they also saved two - thirds of the inference cost by using Google's TPU. This criticism is equivalent to giving the best endorsement to NVIDIA.
Why did Holz not give Google any face and disrupt the event when Google was releasing new chips?
A Public Review with Real Costs
Of course, Holz was not just complaining casually. It was a public review based on real costs. It touches on the most core contradiction in the current AI infrastructure competition: beyond the competition in hardware performance, the moat of the software ecosystem is the real battlefield that determines the outcome.
To understand Holz's regret, we need to clarify the essential differences between Google's TPU and NVIDIA's GPU in the research scenario.
For example, the GPU is like a Swiss Army knife for general - purpose parallel computing. NVIDIA's CUDA platform has been in the works since 2007 and has accumulated nearly two decades of ecological precipitation. It has become a general - purpose platform in the AI industry.
PyTorch, the most commonly used framework by AI researchers, is deeply integrated with CUDA. Almost all open - source model weights on Hugging Face are released in the default GPU format. Nsight performance analyzer, NCCL communication library, and TensorRT inference optimization tool - this is a complete research toolchain. AI researchers around the world start writing code on CUDA in school, and PyTorch is their native language.
The TPU, on the other hand, represents a different ecosystem. It is an application - specific integrated circuit with an underlying architecture designed around a systolic array, specifically optimized for deep - learning tensor operations, and is highly efficient in large - scale stable training tasks. However, it requires the use of the JAX or TensorFlow framework, and its support for PyTorch has been incomplete for a long time. The community resources are scarce, and the debugging tools are not mature. Almost all error - checking has to rely on Google's own documentation.
Considering Midjourney's specific needs, they are engaged in image - generation research, which requires a large number of custom operator experiments, rapid prototype iterations, and the ability to call diffusion model components in the Hugging Face ecosystem at any time. These tasks are effortless in a GPU + PyTorch environment but extremely difficult on the TPU.
For a simple example, a researcher may only need a few hours to verify a new idea on a GPU. On the TPU, just configuring the environment and adapting the framework may take several days. Over time, this is what Holz meant by "being set back by a year."
Why Did They Choose Google's TPU in the First Place?
However, it should be emphasized that Holz's criticism is specifically aimed at the research and training stages. In the inference stage, the logic is completely different. This is also the direct motivation for them to migrate to Google's TPU at that time because using Google's TPU can actually save money, and they don't have to compete with giants for NVIDIA graphics cards.
As early as 2023, Google Cloud officially announced that Midjourney had chosen Google as its core infrastructure provider. Midjourney uses Google's TPU v4/v5 (based on the JAX framework) to train its fourth - and fifth - generation text - to - image large models. At the same time, it rents NVIDIA GPU clusters on Google Cloud to handle the daily generation inferences of hundreds of millions of global users.
Midjourney's choice was very practical. At that time, NVIDIA's H100 was in high demand, and as an independent AI company without the support of giants, Midjourney simply couldn't get a share. Google's TPU had sufficient computing power, and the cost - effectiveness for large - scale matrix operations such as image processing (saving up to 60% of the cost) was very attractive on paper.
In the second quarter of 2025, Midjourney also migrated its main inference cluster from NVIDIA's A100/H100 to Google Cloud TPU v6e. The monthly inference expenditure dropped sharply from about $2.1 million to less than $700,000, saving more than $16.8 million annually, with a payback period of only 11 days.
In other words, the cost - effectiveness advantage of the TPU in large - scale inference is real. What Holz really regrets is that he should have used NVIDIA to polish the model in the research stage and then migrated to Google's TPU in the inference stage to reduce costs, instead of conducting research on the TPU from the start and paying a one - year progress cost for it.
Is NVIDIA's Moat Its Ecosystem?
Holz's public criticism is essentially a testimony to NVIDIA's ecological moat. NVIDIA's moat does not lie in how much faster the H100 is than the TPU, but in the work habits of countless researchers, tens of thousands of open - source code libraries, and the industry inertia of the entire academic community defaulting to the GPU as the experimental platform.
In 2026, PyTorch still accounted for as much as 85% in research papers. Almost all the code for cutting - edge research is based on NVIDIA hardware. This means that any team that wants to use the TPU for research must bear a hidden cost: isolating themselves from the mainstream community, giving up a large number of ready - made tools and resources, and exploring alone in a relatively niche technology stack.
This is why even though the TPU is comparable or even superior to the GPU in some indicators, most research laboratories in the industry still default to choosing the GPU. Perhaps hardware performance can be caught up, but ecological accumulation cannot be achieved overnight. The moat that Jensen Huang has built over nearly two decades is NVIDIA's most valuable asset.
Google clearly realizes the problem. At this year's Google Cloud Next Conference, Google released the eighth - generation TPU and adopted a dual - chip strategy for the first time: TPU 8t (for training) and TPU 8i (for inference). This is the first time in TPU history that training and inference have been split into two dedicated chips with completely different architectures, aiming to solve the problem that Holz complained about.
The TPU 8t, codenamed Sunfish, is co - designed with Broadcom and is aimed at large - scale pre - training. The SuperPod scale reaches 9,600 chips, sharing 2PB of HBM, and the training cost - effectiveness is 2.7 times higher than that of the previous generation, Ironwood. The TPU 8i, codenamed Zebrafish, is designed by MediaTek and specializes in inference and serving. It expands the on - chip SRAM to 384MB, doubles the chip interconnection bandwidth to 19.2 Tb/s, and introduces a new Boardfly network topology, reducing the maximum network hops of a 1,024 - chip configuration from 16 to 7. The cost - effectiveness in low - latency inference of large MoE models is 80% higher than that of Ironwood. Both chips use TSMC's 2 - nanometer process and are expected to be mass - produced in 2027.
Google's dual - chip strategy this time is an important strategic admission: training and inference have evolved into two completely different types of workloads, and a single chip can no longer optimize both ends simultaneously. This stands in sharp contrast to NVIDIA's "one - GPU - fits - all" approach and is also Google's direct response to NVIDIA's Vera Rubin NVL72 and Amazon's Trainium3.
Moreover, Google didn't just release new hardware this time. In response to Holz's complaints about the TPU ecosystem, Google simultaneously launched the TorchTPU project - an engineering plan to enable PyTorch to run natively on the TPU, which is currently in the preview version.
According to Google's roadmap, TorchTPU will support PyTorch's Eager Mode, be deeply integrated with vLLM and TorchTitan, and ultimately achieve linear expansion to the full Pod scale. If TorchTPU truly matures, research teams that adhere to PyTorch will be able to use the TPU for the first time without rewriting their code.
However, TorchTPU is still in the preview version, not the official release. Whether the research workflow of "changing an architecture, adjusting an operator, and quickly verifying an idea" like Holz's will be as smooth on the TPU 8t as on the H100 still requires a lot of practical verification. A door has been opened, but how smooth the road behind the door is can only be determined after the official mass - production in 2027.
Why Does Claude Span Three Platforms?
If Midjourney has so many complaints about the TPU's training ecosystem, then how does Anthropic, the new leader in the AI industry, solve the challenge of using three platforms? After all, they are training and running Claude on three sets of hardware: NVIDIA's GPU, Google's TPU, and Amazon's Trainium.
Anthropic was originally a follower in the first - tier of the AI industry, and its financial resources were far from those of Google and OpenAI. Therefore, it accepted huge investments from Google and Amazon, and one of the exchange conditions was to use the TPU and Trainium chips of these two giants.
Google and Amazon are both strategic investors in Anthropic. The two giants have successively invested nearly $10 billion in Anthropic. Coupled with the $5 billion investment from Microsoft, it's as if the world's three major cloud - computing giants are jointly supporting Anthropic.
The giants are not just making financial investments. A large part of these investments is converted into the revenues of Google and Amazon because Anthropic is currently using more than one million Amazon Trainium chips specifically for training and deploying the latest Claude model. At the same time, it is using hundreds of thousands of Google TPUs and plans to gradually expand to a million - TPU array in the future.
Meanwhile, Google Cloud, AWS, and Microsoft Azure are also the main distribution platforms for Claude in the global B - to - B market. The three trillion - level giants all want to use their sales networks to sell Claude's products and get a revenue commission. More importantly, they want to keep the computing - power traffic on their own servers and bundle their own cloud services.
This all - encompassing situation has also made Anthropic the fastest - growing and best - resourced independent AI giant in Silicon Valley history. Recently, they also reached a computing - power leasing agreement with SpaceX, paying $15 billion annually for the computing power brought by the NVIDIA GPU chips hoarded by Musk in the super - computing center in Tennessee.
So, how does Anthropic manage to use the chips of the three platforms simultaneously?
Anthropic's official statement is: match different workloads with the most suitable chips. NVIDIA's GPU undertakes research experiments and rapid prototyping; Google's TPU and Amazon's Trainium respectively undertake the main workloads of large - scale training and inference, and the two super - computer providers form a balance to prevent a single platform from monopolizing the pricing power.
Anthropic's cooperation scale with Amazon is particularly eye - catching. The two parties have signed an agreement that Anthropic will invest more than $100 billion in AWS in the next decade to obtain a maximum computing - power capacity of 5 gigawatts, covering Trainium2 to Trainium4.
Interestingly, when Google Cloud Next Conference announced in - depth cooperation with Anthropic, Amazon came out to "claim credit publicly," stating that Claude's training was completely completed on Trainium. Project Rainier - a cluster currently running more than one million Trainium2 chips - is one of the largest AI training clusters in the world.
Meanwhile, Anthropic announced the expansion of its cooperation with Google and Broadcom shortly before Google Cloud Next Conference, obtaining multi - gigawatt capacity of the next - generation TPU, which is expected to be put into use in 2027.
What Enables Anthropic to Use Three Platforms?
There is a key technical DNA that enables Anthropic to take this path: the core founding team of the company is from Google Brain, and JAX is their native language. From the beginning, Anthropic has used JAX as the core training framework - the design philosophy of JAX is hardware - independent, and the same set of code can run on GPUs, TPUs, and even Trainiums through the XLA compiler.
This is exactly the opposite of Midjourney's path: Midjourney first established its research workflow in the PyTorch + GPU ecosystem and then tried to migrate to the TPU, and the migration cost caused a one - year loss.
However, this multi - platform strategy also comes at a cost. Every time Anthropic updates its model, it has to test on three architectures separately, and each bug has three potential causes. The deployment complexity is three times that of a single - platform solution. This is the engineering bill that the three - platform strategy must pay.
Google's TPU system is based on the traditional JAX and XLA compilers. Amazon's Trainium system is based on AWS's self - developed Neuron SDK. This means that Anthropic's core distributed training framework cannot be directly transferred.
Therefore, Anthropic's engineering team must dispatch top - notch underlying hardware optimization experts to conduct closed - door joint development with Google and Amazon's chip teams, rewriting and optimizing complex operators and mixed - precision training code with three completely different hardware - level assembly logics. This human - resource cost and time investment are indeed far higher than those of OpenAI, which purely uses the NVIDIA ecosystem.
In contrast, Midjourney has a very small team and simply doesn't have the thousands - strong engineering corps like Anthropic that can work behind closed doors with giant chip teams to modify the underlying code. When facing the niche JAX/XLA compilation environment of the TPU, once they encounter a strange hardware - level bug, the entire team can only stop work and struggle with the underlying code.
In addition, from August to September 2025, Claude users reported a significant performance decline. Anthropic's post - event review revealed three independent infrastructure vulnerabilities: a context - window routing error affected 16% of Sonnet 4 requests on a certain platform; a