HomeArticle

Surge 147 times, Nvidia's strongest rival: Secured 6.8 billion orders

铅笔道2026-07-03 16:16
The latest financing round raised 500 million U.S. dollars, pushing the post-money valuation to 5 billion U.S. dollars (approximately 34 billion yuan).

Recently, AI chip company Etched announced that it has cumulatively received $800 million in financing since its establishment. The latest single - round financing was $500 million, and its post - investment valuation soared to $5 billion (approximately RMB 34 billion).

Etched specializes in AI inference chips. It focuses solely on this, aiming to be faster, more energy - efficient, and cheaper than NVIDIA GPUs.

Before large - scale delivery of Etched chips, the company has already signed customer procurement contracts worth over $1 billion (approximately RMB 6.8 billion).

In 2023, Etched's valuation was $34 million, and it has skyrocketed by about 147 times to date.

01 A Bold Gamble by Three Harvard Dropouts

Etched's story begins with three young men who dropped out of Harvard in 2022.

Founders Gavin Uberti, Robert Wachen, and Chinese - American co - founder Chris Zhu noticed while still in school that almost all large AI models in the market rely on NVIDIA GPUs to run. Although powerful, it's often "using a sledgehammer to crack a nut" and not cost - effective.

Many AI companies complain that more than half of their revenue goes towards computing power fees, essentially working for NVIDIA.

The three founders took the opposite approach: they focused only on chips specifically designed to run large Transformer models, abandoning all redundant functions.

Etched founder Gavin Uberti Source: Etched official website

However, almost all Silicon Valley VCs refused to invest. They argued that if large models no longer use the Transformer architecture in the future, the company would become worthless, posing too high a risk.

After repeated rejections, they chose a "stealth mode" — no external publicity, no media interviews, and focused on R & D for three years.

They recruited senior engineers from NVIDIA, Google TPU, and Broadcom. The team expanded to over 400 people in just three years. They also set up production lines in Taiwan, China, and built their own laboratory in California, fully controlling the entire process of chip design, tape - out, and the whole - machine system.

Etched successfully produced a working finished - product chip on TSMC's 4nm process on the first try — known in the industry as "A0 success on the first attempt," saving over a hundred million in R & D costs.

What's even more interesting is their investor lineup, which can be called the "all - star team" in the AI field: Hinton, the Nobel laureate in deep learning, Fei - Fei Li, the "godmother of AI," Andrej Karpathy, the former head of AI at Tesla, and Peter Thiel, the godfather of Silicon Valley, all invested. The quantitative trading giant Jane Street invested over $100 million in a single round, and TSMC's industrial fund also made a strategic investment. Top - tier hedge funds flocked in.

Capital is willing to bet heavily because they understand the company's business model: while others only sell chips, Etched packages the entire cabinet, interconnection hardware, and supporting software, delivering a complete "AI inference cluster" to customers. Customers don't need to configure and debug the equipment themselves; they can run large models directly after purchase, fundamentally solving the pain point of difficult computing power setup for enterprises.

02 Behind the $1 Billion Order

Etched's financing history and valuation growth

 

Capital is willing to invest because they see a huge gap in the AI inference track — the market urgently needs low - cost and high - efficiency hardware to replace GPUs.

This $500 million in financing will mainly be used for production expansion, supply chain improvement, and R & D team expansion to support the delivery of subsequent large - scale orders.

Etched Sohu chip Source: Etched official website

 

Etched's Sohu dedicated chip is equipped with a large - capacity memory and is specifically adapted to mainstream large models in the market such as Llama, Tongyi Qianwen, and DeepSeek.

According to official internal test data (not independently verified by a third party), a server composed of 8 such chips can output 500,000 characters per second when running a large model with 70 billion parameters. In contrast, a server with the same configuration of NVIDIA H100 can only output over 20,000 characters per second, a 20 - fold difference in throughput.

Comparison of large - model inference performance (Llama 70B) Source: Compiled from public information

While efficiency is improved, the electricity cost is significantly reduced. When running large models, the hardware utilization rate of ordinary GPUs is only about 30% all year round, wasting a large amount of computing power. The utilization rate of Etched's dedicated chips can exceed 90%.

Etched's ability to quickly secure orders and financing is due to the full - scale explosion of the inference market.

The global AI inference market is expected to reach approximately $103.7 billion in 2025, grow to $117.8 billion in 2026, and exceed $312.6 billion by 2034, with a compound annual growth rate of nearly 13%. More optimistic data suggests that the inference market will expand to $491.5 billion by 2034.

Forecast of the global AI inference market size

 

Inference is the biggest cost pain point for AI companies. OpenAI's inference cost in 2024 was as high as billions of dollars, and each long - dialogue of Anthropic's Claude model incurs a backend inference cost of several dollars. Etched claims that its dedicated chip can halve the computing power cost of a single Q & A.

Etched's financing also reveals another notable change: the inference industry chain is taking shape.

The first type of opportunity is in inference chips.

In addition to Etched, companies like Groq and Cerebras are vying for this market. They sell not just chips but entire servers, software, and cluster solutions. Etched revealed that customers are also ordering the entire "Frontier Inference Cluster" rather than just purchasing chips separately.

The second type of opportunity is in inference cloud services.

More and more enterprises are not buying servers themselves but purchasing inference capabilities based on the number of calls.

Platforms like Baseten, Fireworks AI, Together AI, and GroqCloud are essentially "AI inference clouds."

Enterprises can upload models and quickly obtain API services without managing GPUs themselves.

The third type of opportunity is in inference software.

The utilization rate of GPUs has always been low.

A large number of startups are researching technologies such as compilers, scheduling software, KV Cache optimization, model compression, and quantitative inference.

Without retraining the model, software optimization can improve inference efficiency and directly reduce customer costs.

The fourth type of opportunity is in data centers.

The biggest difference between the inference era and the training era is that inference is more decentralized.

Training is concentrated in a few super - data centers.

Inference can occur in the cloud, enterprises, local servers, and even future cars, robots, and mobile phones.

Therefore, inference data centers, edge computing, liquid cooling, power supplies, and network equipment will all see new demands.

03 A Risky Bet

For Etched to meet market expectations, there are three unavoidable real - world challenges.

First, the chips haven't been truly delivered to customers, so all performance claims remain on paper.

Spheron's analysis report in April 2026 clearly stated that the Sohu chip "is not publicly available for purchase or lease," and its cost data cannot be calculated. The real performance gap may not be as exaggerated as 20 times.

Second, the company has a single - minded bet on the technical route and is highly dependent on the Transformer architecture.

The Sohu chip can only run pure Transformer models. However, the industry trend is changing: DeepSeek V4 is a 67.1 - billion - parameter MoE architecture, and Qwen3 - 235B - A22B is also a MoE model, representing the cutting - edge direction of open - source large models.

As an emerging category, diffusion language models have a completely different computing mode from Transformer, and the Sohu chip is also incompatible. If the industry mainstream shifts from Dense Transformer to MoE or a completely new architecture in the future, the competitiveness of this dedicated chip will decline rapidly.

This is the Achilles' heel of all dedicated chip companies — bet right on the trend and soar, bet wrong and lose everything.

Third, both giants and peers are accelerating, and the window of the first - mover advantage is narrowing.

NVIDIA has launched targeted inference GPUs with a 35 - fold surge in performance, and the estimated shipment volume from 2026 to 2027 is 4 to 5 million units.

A more realistic threat comes from the supply chain: any delay in TSMC's 4nm production capacity, HBM3E memory supply, or whole - machine assembly will lead to the inability to deliver orders on time. Etched's $1 - billion contract is a vote of confidence, but only successful mass production and delivery can turn it into real revenue.

There is a cruel rule in the chip industry — "successful tape - out" is just the first step of a long journey. There are countless hurdles between the laboratory and the customer's computer room, such as mass - production yield, heat - dissipation stability, and software compatibility.

This article is from the WeChat official account “Pencil News” (ID: pencilnews), author: Pencil News. Republished by 36Kr with permission.