HomeArticle

Can AI-designed chips rival Blackwell in just 9 months? OpenAI's "Spicy Chip" bypasses Nvidia's main battlefield, but Jensen Huang's GPU hegemony is teetering

极客邦科技InfoQ2026-06-25 18:28
Nine months, with AI involved in the design, to build full-stack control.

Today, OpenAI officially unveiled its first custom artificial intelligence chip developed in collaboration with Broadcom: Jalapeño.

Broadcom CEO Hock Tan said in an interview with Reuters that the chip developed by the team can match the performance of NVIDIA's Blackwell chip or the TPU designed by Google, a subsidiary of Alphabet.

OpenAI plans to deploy Jalapeño by the end of this year, which is also the first step in OpenAI's multi - generation chip development plan. OpenAI has already run samples of the chip in the laboratory, and on the GPT - 5.3 - Codex - Spark AI model, both the power consumption and performance of the chip have reached the target levels.

In terms of the division of labor in the cooperation, OpenAI is responsible for designing the chip based on its own models, kernels, service systems, and product requirements; Broadcom is responsible for chip implementation, network, and connection technologies; Canadian electronics manufacturer Celestica provides professional capabilities at the board, rack, and system levels to help realize the industrialization from chips to server systems and then to large - scale production and deployment. It is reported that OpenAI will then hand over the design to TSMC for manufacturing.

Some netizens said, "A few years ago, I wouldn't have thought that AI companies would design their own chips. To be honest, the speed of change in this industry is extremely exaggerated, and it makes people start to think: as more and more AI laboratories start to develop their own custom hardware, what will happen to companies like NVIDIA and AMD in the next few years?"

Some other netizens sighed, "While trying to break free from NVIDIA's monopoly and develop their own chips, naming the chip Jalapeño already shows how fierce this competition has become."

However, many netizens don't understand why the chip is named Jalapeño. Some netizens said, "When it comes to naming, OpenAI might be the worst in history." Some people also made some meme pictures.

1. Nine Months, AI - Participated Design, Building Full - Stack Control

Richard Ho, the head of OpenAI's hardware, said that the Jalapeño processor is designed to work quickly and efficiently with large models that drive a large number of AI applications. "We believe it will perform well on all future iterative versions of large models."

Jalapeño is a chip designed from scratch for large - model inference. It will serve scenarios such as ChatGPT, Codex, API, and future Agentic products. The goal is to achieve high throughput, low latency, and high energy efficiency in large - scale interactive AI products.

"The profit margin at the software level cannot be maintained in the long run under the scale of gigawatt - level inference computing. To further lower the cost bottom line of each token, building a custom ASIC is an inevitable infrastructure transformation." Some netizens commented.

Richard Ho, the head of OpenAI's hardware project, introduced that the key to the architecture optimization of Jalapeño comes from the close collaboration between OpenAI and the research team, including the understanding of the most critical kernels, memory movement, network, and service models in cutting - edge AI models.

OpenAI is still evaluating the final performance of the chip, but early tests show that Jalapeño can run close to the theoretical hardware limit on important workloads. It is reported that this architecture reduces data transmission and balances computing, memory, and network resources, making the actual utilization closer to the theoretical peak performance. Compared with simply stacking computing power, this design emphasizes the real efficiency in large - model inference. In addition, OpenAI also said that the heat dissipation performance of the chip is even better than expected.

This also explains why OpenAI calls it an "Intelligence Processor" rather than just an "AI accelerator".

It only took nine months from the initial design to the manufacturing tape - out of Jalapeño. OpenAI believes that this is one of the fastest ASIC development cycles in the custom AI accelerator projects in the high - performance advanced semiconductor field.

The reason for the relatively fast design cycle is not only the in - depth cooperation between OpenAI's engineering team and Broadcom and Broadcom's senior experience but also that OpenAI used its own models in some design and optimization processes. OpenAI said that the models are helping to improve the infrastructure required for its own future operation.

This shows that AI is not only a user of chips but also starting to become part of the chip design process. OpenAI believes that if AI can help engineers design better chips faster, it may reduce the computing cost of the entire industry and promote the wider accessibility of advanced AI.

Previously, Hock Tan revealed that the Jalapeño accelerator can save about 50% of the cost compared with typical AI graphics processing units.

Jalapeño is not a one - time single - chip project but the first step in the multi - generation computing platform jointly built by OpenAI and Broadcom. Broadcom said that it expects the first batch of chips to be put into commercial use at Microsoft and other partners by the end of this year, but OpenAI said that the real mass production will not come until next year. OpenAI's goal is to achieve a computing power of 10 gigawatts by 2029 using custom chips.

"This gives OpenAI full - stack control," Ho said.

OpenAI believes that the release of Jalapeño marks that the company is further expanding its full - stack platform capabilities: from products and models to underlying chips.

"The focus of the next artificial intelligence competition may lie in infrastructure, not just intelligence itself." Some netizens sighed.

Some other netizens compared OpenAI's Jalapeño project with the deal between SpaceX and Cursor and said that it seems like a completely different story, but they actually point to the same structural change: Jalapeño represents the control of the underlying infrastructure that supports the operation of intelligence, including chips, computing power, and networks; while Cursor represents the control of the "workflow layer" where intelligence is actually used.

"As the capabilities of cutting - edge models continue to increase, the competitive advantage is gradually shifting from the model itself. In the next decade, the companies that win the AI competition may no longer be just those with the smartest models but those that can control the strongest 'technology stack' around the models." They summarized.

"The world is entering a computing - driven economy," said Greg Brockman, President and Co - founder of OpenAI. Jalapeño is part of OpenAI's long - term full - stack infrastructure strategy, aiming to make computing power more abundant, so that AI is faster, more reliable, and more affordable for individuals and enterprises and can be used to solve more important problems.

In OpenAI's view, the advantage brought by full - stack capabilities is that different levels can be collaboratively optimized around the same goal: to make the model faster, more reliable, and cheaper. Better infrastructure can improve computing efficiency, and higher computing efficiency can support better training and inference, further promoting stronger models and better products. As product usage increases, OpenAI can reinvest its revenue in the next - generation infrastructure, forming a flywheel around computing power, models, products, and commercialization.

2. Chips Become a Battleground, OpenAI Avoids Direct Competition with NVIDIA for Now

OpenAI's first chip product actually avoids direct competition with NVIDIA, Google, etc.

Currently, it is obvious that the training and inference infrastructures are diverging. At present, many inference workloads still run on infrastructures similar to those for training. However, as it accelerates its popularization, the inference call volume will increase significantly and gradually become the main source of computing power demand. Compared with training, inference is more sensitive to cost, energy efficiency, and response speed and is easier to optimize in hardware according to specific usage scenarios. Therefore, the inference infrastructure will increasingly favor dedicated hardware.

It can be seen that OpenAI's current focus is on this field. It continues to rely on external chips such as those from NVIDIA for training and first develops the inference chips for internal use.

In contrast, NVIDIA's core idea is not "one set of training chips and one set of inference chips" but to use a sufficiently general GPU architecture to handle training, inference, and a wider range of data - center AI workloads. For example, Hopper and Blackwell can be used for both training and inference.

However, NVIDIA will more clearly market certain products towards inference in terms of marketing and product form. For example, the official has now clearly packaged the Blackwell platform as a large - model inference platform. It claims that the GB300 NVL72 can significantly reduce the cost per token in the agentic inference scenario and emphasizes "AI inference at scale".

Similarly, Google TPU is an ASIC customized for matrix multiplication, tensor calculation, and Transformer deep - learning workloads. The core goal is to make the most core tensor calculation in training and inference more efficient and to be deeply coupled with its own software stack, data center, and model system, so as to be superior to general - purpose GPUs in terms of cost, power consumption, and interconnection.

Of course, Google also has some products for inference, but basically, it does "inference optimization" within the TPU system. For example, the TPU v5e integrates training and inference (service), while the v6e - 8 configuration is optimized for inference, allowing 8 chips to serve the same inference workload.

"Once inference becomes your biggest cost item, you will no longer rent chips but start to make your own. All those who are still renting out computing power may be a bit nervous today." Some netizens said. In addition, whether OpenAI's future series of chip products will be publicly sold may have an impact on companies like Groq, which claim "providing fast, low - cost inference that won't go wrong even in really important situations".

Reuters reported as early as 2023 that OpenAI was exploring self - developed chips. OpenAI once considered fully self - developing and raised funds to implement a costly plan to build a network of chip manufacturing plants called "foundries". However, due to the high cost and time required to build this network, the company has now shelved this ambitious plan and instead focuses on internal chip design work.

Behind this measure is the dilemma faced by AI laboratories represented by OpenAI, which is the shortage of computing power and the difficulty in obtaining sufficient computing resources to run the latest and most powerful AI applications. For this reason, some leading companies have turned to self - developed chips, hoping to reduce costs and provide an alternative to the NVIDIA GPUs widely used in AI at present.

Companies such as Meta, Amazon, and Google have also chosen to cooperate with enterprises such as Broadcom and Marvell. These companies can provide specific design services and intellectual property, and these capabilities are often difficult to fully replicate internally. In April this year, Reuters revealed that Anthropic was also considering building its own AI chips.

Future Inference: CPU + Multiple Custom AI Accelerators

There is no doubt that one of the most direct impacts of generative AI on the semiconductor industry is the rapid increase in the demand for CPUs, GPUs, and AI accelerators.

McKinsey predicts that by 2030, the demand for logic wafers brought by non - generative AI applications will be about 15 million pieces. Among them, about 7 million pieces will be produced using process nodes >3 nanometers, and about 8 million pieces will be produced using process nodes ≤3 nanometers. On this basis, generative AI will additionally bring a demand for 1.2 million - 3.6 million pieces of wafers produced using ≤3 - nanometer process nodes.

According to the current planning of logic wafer fabs, by 2030, the world is expected to produce about 15 million pieces of wafers using ≤7 - nanometer process nodes. This means that generative AI may cause a potential supply gap of 1 million to about 4 million pieces of advanced logic wafers, especially concentrated in the advanced process nodes of ≤3 nanometers.

McKinsey estimates that to make up for this gap, 3 - 9 new logic wafer fabs may need to be built by 2030. Due to the large investment scale, long construction cycle, and complex equipment and supply chain of advanced logic wafer fabs, this will become a key issue that the semiconductor industry must plan for in advance.

On the training side, the future architecture is expected to continue the current high - performance cluster mode, that is, servers in the data center are connected through a high - bandwidth, low - latency network. McKinsey said in the report that the current mainstream high - performance generative AI servers usually use a combination of two CPUs and eight GPUs. By 2030, most training workloads will still use this CPU + GPU architecture. At the same time, GPUs and AI accelerators may also evolve towards system - level packaging design and coexist with the existing architecture for a long time.

On the inference side, the situation will be significantly different. By 2030, more AI servers for inference are expected to use a combination of a CPU and multiple custom AI accelerators. Most of these AI accelerators will be based on ASICs. Since ASICs can be optimized around specific AI tasks, they are expected to achieve lower costs, higher energy efficiency, and better performance in large - scale inference scenarios.

3. The "Memory Wall" Remains the Biggest Uncertainty

It is worth noting that Broadcom CEO Hock Tan revealed in an interview with Reuters that currently, affected by the surge in memory demand driven by AI, the profit margin of Broadcom's custom chips is not as high as that of some of its other chip products, such as network switching chips.

Tan said that AI chips require a large amount of high - bandwidth memory, which poses a challenge to the profit margin of Broadcom's custom AI chip products. He said that South Korea's SK Hynix and Samsung Electronics supply memory chips to Broadcom.

Generative AI mainly drives two types of DRAM: one is high - bandwidth memory HBM connected to GPUs or AI accelerators, and the other is DDR memory connected to CPUs. HBM has higher bandwidth and is an indispensable key component in current AI training and high - performance inference. However, compared with DDR, HBM requires more silicon area to store the same amount of data, so it also brings higher manufacturing pressure.

SK Hynix is one of the biggest beneficiaries of the AI memory shortage, but its HBM production capacity is highly strained, and its core customers have probably locked in the quantity in advance. SK Hynix previously said that all its DRAM, HBM, and NAND flash products for 2026 have been sold out. Micron's latest financial report also shows that the overall supply of AI memory may remain tight until after 2027, which indicates that there is a shortage of HBM supply in the industry as a whole.

Currently, major companies have been expanding memory capacity. However, the growth of memory capacity is not simple, and it is bringing challenges to hardware and software design. The most core problem is the "memory wall": memory capacity and bandwidth are becoming bottlenecks for system - level computing performance. Even if the computing chip itself has higher peak performance, if the data cannot be read, transmitted, and processed fast enough, the overall system performance will still be limited.

Currently, the industry is exploring multiple solutions. For example, static random - access memory SRAM is used to increase near - computing memory, but its wide adoption is still limited due to high costs. At the same time, future algorithms may also reduce the memory required for each inference run, thereby slowing down the growth of total memory demand.

Another uncertain factor comes from the AI accelerator architecture. Compared with the CPU + GPU architecture, some AI accelerators may have a lighter memory demand. As the inference workload grows, AI accelerators may become more popular by 2030, which may lead to a lower growth rate of memory demand than in some high - expectation scenarios.

Reference Links:

https://www.reuters.com/world/asia - pacific/openai - unveils - custom - chip - it - designed - with - broadcom - boost - its - ai - infrastructure - 2026 - 06 - 24/?utm_source=chatgpt.com

https://www.mckinsey.com/industries/semiconductors/our - insights/gener