Is CUDA on the verge of decline? Claude flattens NVIDIA's moat in 30 minutes, and AMD is going to be overjoyed.
Is NVIDIA's moat about to fall? Claude Code completed programming in half an hour and directly migrated the CUDA backend to AMD ROCm.
Overnight, has the CUDA moat been ended by AI?
These days, a developer named johnnytshi shared a shocking operation on Reddit:
Claude Code successfully ported a complete CUDA backend code to AMD's ROCm in just 30 minutes.
Throughout the process, not a single line of code was written manually.
This momentum is simply going to bridge the gap between these two ecosystems.
More importantly, this porting did not rely on traditional "intermediate conversion tools" such as the Hipify translation layer at all, but was completed with one click through the CLI.
Even Anush E., the vice president of AMD software, was shocked. The future of GPU programming belongs to AI agents.
As soon as the news came out, the entire tech circle instantly boiled over. Many people exclaimed: NVIDIA's CUDA moat is about to fall...
What on earth is going on?
Claude tears apart CUDA in just 30 minutes
Claude Code runs on an agent framework, which means it can "think on its own".
During the execution process, it doesn't mechanically convert keywords but truly understands the code, that is, the underlying logic of specific kernel functions.
Developer johnnytshi introduced that in this porting, the most tricky problem of data layout differences was also solved by AI, ensuring that the core computational logic of the kernel remains consistent.
Amazingly, johnnytshi ported the entire CUDA backend to AMD ROCm in just 30 minutes without using any translation layer in between.
Another advantage, of course, is that there's no need to laboriously set up a complex translation environment like Hipify; you can just get the job done directly in the command line (CLI).
Now, the whole internet is flooded with the cry that the CUDA moat has been breached.
After all, NVIDIA's dominant position is largely built on the CUDA programming ecosystem, which has almost become an industry standard.
Countless AI frameworks, deep learning libraries, and scientific computing tools are deeply dependent on it.
Although AMD's ROCm is powerful, it has always faced pain points such as ecological compatibility and high migration costs for developers.
Now, Claude has shattered the threshold in a very short time. Maybe in the future, more CUDA code can run on AMD GPUs easily.
Implementation details
On GitHub, johnnytshi himself also updated the logs and explanations.
A complete ROCm backend was implemented for AMD GPUs, thus supporting modern chess networks based on the attention mechanism on RDNA 3.5 and other AMD architectures.
GitHub: https://github.com/LeelaChessZero/lc0/pull/2375
A complete ROCm backend was added in src/neural/backends/rocm/
The attention network architecture (multi - head self - attention, FFN, embedding layer) was implemented
rocBLAS was used for GEMM operations, and MIOpen was used for convolution operations
The NCHW layout was optimized for FP16 performance on RDNA 3.5
Three backend variants are provided: rocm (FP32), rocm - fp16 (FP16), rocm - auto (automatic detection)
MIOpen is a mandatory dependency (similar to cuDNN for CUDA)
The AMD GPU architecture is automatically detected through rocm_agent_enumerator
Compilation options: -Drocm=true -Damd_gfx=gfx1151 (or use automatic detection)
Performance description:
FP16 performance: >2000 nps on Strix Halo (Radeon 8060S, gfx1151)
Automatic Batch Size tuning (min_batch = 64 on RDNA 3.5)
rocWMMA was tested, but rocBLAS has better performance
Verification situation (Strix Halo - Radeon 8060S, gfx1151):
Test models: 768x15x24h - t82 - swa - 7464000.pb.gz and maia - 1900.pb.gz
Backend: rocm - fp16 functions normally and can generate correct moves
Environment: ROCm 7.2.53150, MIOpen 3.5.1
Note: Only tested on RDNA 3.5; other AMD architectures have not been verified yet
The future of GPUs belongs to AI agents
Of course, this demonstration also has limitations.
For simple or moderately complex kernels, Claude Code performs very well. More importantly, the core of writing kernel functions lies in achieving "deep hardware" optimization.
However, some people think that Claude Code still falls short in this aspect -
If faced with complex kernels that have been extremely optimized for specific hardware cache levels and memory access patterns, AI is currently still difficult to completely replace human experts.
Even so, the signal released by this event is strong enough.
In the past few months, the ZLUDA project and Microsoft's internal attempts have all tried to break the monopoly of CUDA.
But most of them rely on rule - based mapping or intermediate layers, with limited automation and intelligence levels.
The agent - style programming represented by Claude Code directly skips these steps and bridges the ecological gap through "understanding + autonomous decision - making".
As the vice president of AMD software said, the future of GPU programming belongs to AI agents.
100% AI programming across the board
Now, Claude Code has got the entire Silicon Valley hooked (Claude - Pilled).
Two days ago, CEO Dario Amodei made another bold statement at Davos: Software engineers are running out of time. In the next 6 - 12 months, AI will be able to completely replace them!
Even the engineers at Anthropic no longer write code manually; it's all done by Claude.
Don't disbelieve it; it's true.
In a recent interview with Wired, Boris Cherny, the father of Claude Code, admitted that "100% of his code is written by AI".
Perhaps the engineers at Anthropic never imagined that a "side project" would make Silicon Valley go so crazy.
Boris Cherny recalled, "When we released Claude Code a year ago, we weren't even sure if 'agent programming' would work, but the popularity came so fast."
Cherny's personal experience is the best example:
When it was first released, only 5% of his code was written with Claude Code;
By last May, with Opus 4 and Sonnet 4, this proportion became 30%;
And now, with Opus 4.5, 100% of his code in the past two months has been completed by Claude Code.
Inside Anthropic, this full - scale AI adoption has reached an extreme.
Almost 100% of the technical employees are using Claude Code, and even 95% of the code of the Claude Code team itself is written by itself.
Even Stanford AI professors are using it
It has to be said that the evolution speed of AI programming is astonishing.
Looking back from 2021 to 2024, most tools were just advanced versions of "autocomplete", humbly suggesting a few lines of code while developers were typing.
But in early 2025, with the release of early agentic programming products by startups like Cursor and Windsurf, the rules of the game changed -
Developers only need to describe the functions in plain language, and leave all the dirty work to AI agents.
Claude Code was truly born at this time.
Boris Cherny admitted that the early versions had their stumbles and even got stuck in infinite loops. But Anthropic made a bold move: instead of developing products based on the current capabilities of AI, they built for the future that AI is about to reach.
This bet paid off. With the release of Anthropic's next - generation flagship, Claude Opus 4.5, AI programming has reached a real "turning point".
Kian Katanforoosh, an AI lecturer at Stanford University and the CEO of Workera, recently migrated the entire company to Claude Code.
He said bluntly that for senior engineers, Claude Code is more powerful than Cursor and Windsurf.
Katanforoosh sighed that the only model that has shown a step - change improvement in programming ability recently is Claude Opus 4.5.
"It doesn't feel like it's imitating humans to write code, but it really finds a smarter solution path."
It is rumored that Microsoft is also adopting Claude Code on a large scale internally.