Six Key Questions Regarding Rubin

NVIDIA is beginning to show an imagination similar to that of Apple.

NVIDIA didn't launch any new graphics cards at CES 2026.

Instead, Jensen Huang spent nearly two hours elaborating on a brand - new AI supercomputing architecture called Vera Rubin and a set of figures that are enough to rewrite the industry rules:

When running AI inference on Rubin, the throughput increases tenfold. The number of GPUs required to train a trillion - parameter model can be reduced to one - quarter of that of the previous generation, Blackwell, and the cost per token drops to one - tenth.

This might signal something.

It may indicate that in the field of traditional consumer - grade GPUs, the marginal effect of performance improvement through process and architecture micro - iterations is weakening. In other words, it's no longer sufficient to support an exciting product launch.

Here are some of our thoughts on this matter.

Regarding the Absence of New Graphics Cards and Rubin's Business Model

The key to understanding Rubin is to change the perspective.

It's not just a faster GPU. You can regard it as an extremely vertically integrated AI computing system. NVIDIA designed six specialized chips with different functions but deeply coupled to form a package:

Vera CPU (responsible for AI data flow scheduling), Rubin GPU (core computing unit), NVLink 6 (ultra - high - bandwidth internal interconnection), ConnectX - 9 SuperNIC (AI - specific network), BlueField - 4 DPU (offloading storage and security tasks), and Spectrum - 6 Ethernet switch chip.

The six chips work together with the goal of integrating an entire data center cabinet into a seamless "giant AI computer."

What Rubin solves is the problem of system scale - up, not the performance limit of a single chip. It transforms the stacking of computing power from "manually assembling a racing car engine" to a "standardized automobile factory assembly line." The resulting efficiency improvement and cost reduction are inevitable outcomes of system - level optimization.

This model is indeed similar to Google's approach of building its own AI infrastructure through TPU and its interconnection technology. NVIDIA's Rubin is targeted at customers with similar needs to Google, namely ultra - large - scale AI manufacturers or cloud service providers that need to process massive amounts of tokens, train, and run trillion - parameter models.

Compared with NVIDIA's previous business model, there is a shift from "selling shovels" to "selling production workshops." The performance improvement (such as 10 - fold inference throughput) and cost reduction (1/10 of the token cost) it brings are the potential that can be released through this specialization and system - level optimization.

However, its limitations lie here as well.

Rubin's power can only be fully unleashed when dealing with its preset, highly parallelized AI computing loads. For scenarios such as graphic rendering, general scientific computing, or small - scale model inference, its complexity and cost may not be worthwhile. It aims at a large but specific "mainstream" market.

Regarding the Impact on the Existing AI Hardware Ecosystem

Will the emergence of Rubin mark the countdown to the era when "hoarding high - end GPUs" is the core competitiveness?

If Rubin can really achieve market popularity, some awkward situations will follow:

For the first - wave manufacturers that make a profit by buying, selling, or leasing computing power (such as H100 clusters), their business models will face huge pressure. When the new - generation system can provide inference services at a much lower unit cost, the cost - performance advantage of the old clusters will quickly disappear, unless they can quickly upgrade to the new architecture.

For AI companies that invested heavily in building their own GPU clusters in the early stage, their situation is more delicate. These hardware assets won't become obsolete in the short term and can still be used for R & D and existing services.

But the problem lies in the future competition dimension.

When new players can easily obtain inference capabilities comparable to yours with Rubin - level cheap computing power, the strategic value of the computing power barrier you built with huge capital will shrink sharply. Competition will shift more quickly and thoroughly to the superiority of model algorithms, the uniqueness and closed - loop of data, and the fit between products and the market.

NVIDIA's own role will also evolve accordingly. It is indeed moving closer to "the Qualcomm of the AI era," that is, providing core, standardized computing modules. However, the integration level shown by Rubin is much more complex than that of a mobile phone SoC, and it is closer to providing a complete set of reference designs and system solutions.

In the future, if its supercomputing architecture (such as DGX SuperPOD) is delivered on a large scale in the form of cloud services, it will also have an additional "operator" attribute, directly outputting AI computing power services to end - users.

Regarding the Window Period of the Token Parity Era

The length of the window period of the "parity inference era" promised by Rubin depends on two key variables: the sales ramp - up speed of Rubin and the iteration speed of the model capabilities of existing giants.

If Rubin can be launched on a large scale as scheduled in the second half of 2026 and quickly deployed by major cloud providers (such as AWS, Azure, GCP), then the access point for this "parity computing power" will spread rapidly.

The window period may not be long. During this period, existing companies must complete a key transformation from "relying on hardware scale" to "relying on software and ecological advantages."

Specifically, they may need to: leverage their existing computing power advantages to accelerate the training of models with generational differences and build high - enough algorithm barriers; quickly integrate their business deeply with specific business scenarios to form a data closed - loop and customer stickiness, so that computing power cost is no longer a decisive factor; actively explore innovative applications and ecosystems based on existing models to occupy users' minds and market share before the wave of parity computing power arrives.

When the cost of obtaining advanced computing power becomes the same for everyone, the advantages of companies that only rely on computing power stacking without unique technologies or product moats may quickly evaporate.

Regarding the AI Bubble and Next - Generation Contenders

It should be noted that NVIDIA's large - scale investment in Rubin removes the biggest cost and scale barriers for the full realization of the commercial value of AI, but it itself cannot automatically create value.

Simply put, it solves the problem of "whether the cost is feasible," not "whether the demand exists."

The AI bubble theory often questions whether the sky - high training cost can generate matching commercial value. Rubin brings down the cost, which actually significantly lowers the threshold for verifying commercial value.

More startup teams can test more radical and complex AI ideas at an affordable cost. Therefore, the next logical step is not the bursting of the bubble. The industry may be moving from a brute - force stage relying on capital stacking to a healthier screening stage that depends more on innovation than capital.

The entrepreneurs who can effectively utilize Rubin - level computing power first may not be the ones with the most capital now, but must be the teams with the deepest insights into AI native applications and the ability to maximize the potential of cheap inference. They may be the contenders for the next - generation "killer apps."

From this perspective, in the long run, the sky - high cost of computing power in the past limited the number of players in the market. Their business stories were based on "I have scarce computing power," and the verification of commercial value was postponed.

The new logic after Rubin is that the computing power threshold drops sharply, and the number of players entering the market will surge. Of course, this may lead to the emergence of a large number of homogeneous applications, and market competition will become extremely fierce instantly. Because many applications that only rely on "I have an AI function" will quickly lose value as they cannot provide unique advantages with similar costs.

The real value creators (teams with unique data, sophisticated algorithms, and in - depth industry insights) will stand out, while those without real capabilities will be exposed more quickly. So, the arrival of Rubin may not mean the end of the bubble, but rather the start of a more intense elimination round.

Regarding the Deep - seated Reason for Not Launching New Graphics Cards

As a genius sales master, Jensen Huang didn't promote graphics cards at this CES global conference. This is really worth discussing. We can even reasonably guess that under the physical boundaries of semiconductors, the innovation limit is being approached?

In the transistor miniaturization competition of traditional GPUs, the difficulty of achieving generational performance leaps is indeed increasing. Meanwhile, the growth curve and profit margin of the AI data center market have formed an absolute strategic attraction.

Against the backdrop of the overall tight production capacity of advanced packaging, HBM memory, etc., NVIDIA's decision to give absolute priority to investing resources (R & D, production capacity, and market influence) in the AI infrastructure battlefield that determines its future is somewhat inevitable.

On the other hand, in the past year, NVIDIA's dominant position in the industry has faced many challenges, especially some disruptions from technology companies like Google.

Without process dividends or disruptive architectural breakthroughs, hastily launching slightly upgraded products may disrupt the market rhythm and affect the sales of the existing product line (such as the RTX 40 series). NVIDIA has the capital to wait for a more suitable release time.

More Practical Issues

Putting aside the above, there are two very practical issues: one is whether the cost of migrating from the existing architecture to Rubin matches the benefits; the other is the stability and robustness risks that are inevitable in the new architecture.

For practitioners, migrating from the Blackwell or earlier architectures to Rubin is far more than a simple hardware purchase.

The most typical example is that the Rubin system, which integrates six cutting - edge chips and adopts a full - liquid - cooling design, will inevitably have an extremely high price per cabinet or tray, significantly higher than the current - generation system.

Regarding the computing logic, customers won't simply pay for the current computing power. They are buying a ticket to the next - generation AI cost structure. The core comparison indicator is not the "total cost of ownership (TCO)" but the "cost per intelligence" - that is, the comprehensive cost of processing each trillion tokens and training each trillion - parameter model.

Is it worth it? For the R & D of cutting - edge models with extremely scarce resources and high - speed iteration (such as AGI - pursuing laboratories) and ultra - large - scale AI cloud service providers, the answer is likely to be yes.

Even if the unit price of the hardware is high, if it can reduce the inference cost of the massive tokens it serves by an order of magnitude or launch a more powerful model several months ahead of competitors, this investment can be recouped in a very short time through market leadership and lower operating costs. This is a battle for survival and leadership.

In addition to these, there are also hidden costs for migration and adaptation. For example, Rubin's NVFP4 tensor core, new memory hierarchy (such as the context storage platform driven by BlueField - 4), and CPU - GPU collaboration mode all require in - depth optimization of the existing deep - learning frameworks, model architectures, and scheduling software, and even partial code rewriting. This requires a large amount of engineers' time and verification costs.

Decision - makers also need to consider when they can get their investment back. For example:

Based on the lower token cost of Rubin, how much will their business volume (inference requests, model training tasks) increase? How much power cost can be saved through the energy - efficiency improvement brought by the new architecture? Compared with the path of "maintaining the old system but bearing higher marginal costs and gradually losing competitiveness," is the net present value (NPV) of investing in Rubin in advance positive?

For most enterprises, this balance point may not come immediately.

On the other hand, stability and robustness are the natural drawbacks of this system - level extreme innovation.

For any hardware process, a sharp increase in complexity will inevitably lead to the spread of fault points. If one or two traditional GPU clusters have problems, the tasks can be migrated to solve them. However, Rubin is a super - organism with precise coupling inside (Vera CPU, Rubin GPU, NVLink 6, DPU, super network card). Any abnormality in a key component may affect the overall system's collaborative efficiency.

For engineers, the difficulty of fault diagnosis will probably also increase. When performance problems or errors occur, the root cause of the troubleshooting may be any link in the hardware (any of the six chips), firmware, driver, interconnection protocol, or system software. This deep integration makes the traditional "divide - and - conquer" debugging method extremely difficult.

We also noticed that in response to these risks, Jensen Huang mentioned several features in his speech, such as full - link confidential computing and encryption, a completely redesigned power supply and cooling system, and "offloading" and "isolation" through DPU.

However, no matter how exquisite the design is, a new system as complex as this must be verified through large - scale, long - term, and diverse actual workloads.

According to past experience, early adopters will inevitably take on the role of "co - testers" and work with NVIDIA to discover and solve problems that cannot be foreseen in the laboratory.

This process may take a longer time.

This article is from the WeChat official account “New Vision” (ID: xinmouls), author: Li Xiaodong, published by 36Kr with authorization.