HomeArticle

Elon Musk is urgently promoting the "American programming version of DeepSeek". Comment section: It's not as good as the free one...

硅基星芒2026-06-01 15:54
A delicate self-rescue and a bold gamble

Inside, DeepSeek and Xiaomi are slashing prices like crazy. Outside, Anthropic and Google are introducing new products. Even Elon Musk can't sit still.

Early this morning, Musk made a high - profile repost on the xAI platform, trying to drop a bombshell in the AI developer community.

The cause of the matter is that the well - known agent platform Kilo Code released an extremely counter - intuitive hard - core actual test: The tester only gave a vague and open - ended instruction, and Grok Build 0.1, the latest programming model released by xAI, within an extremely short period of time, completely planned, wrote, and finally launched a Webhook backend microservice with complex back - off and retry mechanisms, secure signature verification, and database persistence capabilities.

What's even more eye - catching is its final bill. The whole process was carried out smoothly, and the total cost was only $1.65. Musk also personally liked and reposted this and left a very inciting comment: "Good value for money."

Today, when the price of GPT - 5.5 remains high and the computing power tax of Claude Opus 4.8 is heavy, Grok Build 0.1 under Musk's banner makes people hard not to think that it is trying to replicate the path of Chinese large models in Silicon Valley: Redefine the cost - performance ratio of AI Coding with an extremely low price.

However, there is a saying in the developer community, "Musk's words are deceiving." Has Musk really created an "American programming version of DeepSeek" instead of a so - called "American soybean bun"? Don't rush to cheer. Strip off the coat of actual test experience and combine the power map of global AI competition and the source code exploration of senior engineers. In fact, this is a delicate self - rescue and a bold gamble.

01 Self - rescue Plan

To understand the positioning of Grok Build 0.1, you can't just look at Musk's countless tweets a day. Instead, you need to look at the survival dilemma of the Grok series of models under xAI.

Some time ago, after Google released Gemini 3.5 Flash, the response was extremely poor, and it was ridiculed by many as the "American soybean bun." But in my opinion, this title is more appropriate for Grok. After all, among the current global first - tier large models, xAI's situation is actually very embarrassing.

According to the latest ranking data from the authoritative evaluation agency Artificial Analysis, although the Grok series of models still hold up in some parameters, they have been surrounded by the "China - US coalition" on the core Intelligence Index ranking list.

Leaving aside the globally top - tier and far - ahead "Big Three" of OpenAI, Anthropic, and Google, Alibaba's Qwen3.7 Max, Kimi K2.6 from the Dark Side of the Moon, and Xiaomi's recently price - cut MiMo - V2.5 - Pro have all comprehensively suppressed Grok in multiple benchmark tests.

In the two more specific fields of Coding and Agentic, xAI's performance is even more disappointing. It has long been out of the top ten and is ignored in the developer community. The only stage for Grok now is to shine on the x platform with its multimodal capabilities and lenient content restrictions. It truly lives up to the name of the "American soybean bun."

In this situation of "unable to compete in all - around and having its ecosystem eroded," Musk, who lost the lawsuit against OpenAI not long ago, was on pins and needles and resolutely chose a very smart tactic: Copy the homework of Anthropic, which is both a partner of xAI and the biggest competitor of OpenAI, and take the route of a "lopsided student" specializing in vertical programming.

Grok Build 0.1 is the first product of this thinking. Its pricing is extremely aggressive: $1 per 1M tokens for input and $2 per 1M tokens for output, which is less than one - tenth of GPT - 5.5 and Opus 4.8.

Musk knows very well that developers around the world have a common characteristic, that is, they are extremely sensitive to price and performance. He tries to regain his original ecological niche with "freedom to make mistakes." Even if the code generated once doesn't work, it doesn't hurt to spend a few cents to run it again. Therefore, Musk can only use this "cheap labor" model to try to pry open OpenAI's moat from the vertical cut - in point of programming.

02 Good Value for Money

Objectively speaking, this actual test by Kilo Code really gave Musk and Grok a boost. What it showed was not only code - generation ability but also amazing Agentic workflow logic. The manifestation of this powerful logical ability even made some senior backend engineers feel a hint of professional crisis.

After reading the technical report released by Kilo Code, Grok Build 0.1 mainly has two highlights:

First, the planning depth at the level of an architect.

The thinking of this new model is almost exactly the same as that of a human architect. It refuses to act blindly and asks "why" first.

"Build a microservice with TypeScript, Bun, and SQLite." This is an instruction given by a product manager who understands technology. But just seeing this instruction, countless programmers may already have a headache: the task is very open - ended, there is no strict architectural planning, and there are no specific requirements.

However, Grok this time behaved like an architect with many years of work experience. It didn't directly output code but first conducted an online search, deeply investigated the industry standards on Stripe and GitHub, and threw several key architectural counter - questions to the tester:

Kilo Code named this the "planning stage," and no one would have thought of the total cost of this stage: $0.17, accompanied by a report containing an ASCII architectural diagram, Drizzle Schema definition, and a clear risk assessment.

This thinking of "thinking before acting" is an essential professional quality for human engineers and is also the key technology for Grok to avoid the problem of "answering off - topic," which is most likely to occur in early AI programming.

Second, extremely comfortable self - error - correction ability.

During the coding stage, Grok can output code at a smooth speed of 120 tokens per second.

Moreover, when configuring the environment, it also encountered the ABI mismatch of Bun and the type error of Zod, which obviously requires manual intervention in the traditional Vibe Coding process. But without any prompts, Grok independently diagnosed the error, readjusted the import path, and modified the configuration file, and finally completed 26 project files in one go.

This is also the feature that Kilo Code highly praised: zero tool - call failures throughout the process, and the cost was only $1.48. This smooth Agentic experience really lives up to the name "Build."

03 Fatal Shortcomings

Just when people were about to cheer for the productivity that can be obtained for just a few dollars, the sober voices on social platforms and technology communities gave Musk a heavy blow.

Obviously, Musk is trying to redefine the cost - performance ratio of AI Coding.

The low price of Grok Build 0.1 is based on the comparison with the expensive GPT - 5.5 and Opus 4.8. But if you look globally, the limitations of this low - price marketing are highlighted. Right in the comment section of Kilo Code's technical report, netizens directly fired:

"This is sheer nonsense. Even the free version of DeepSeek Flash can handle problems of this scale."

The technology community Linux.do also had a poor response. The model was evaluated as "not proactive in work and having poor comprehension ability."

This exposes an embarrassing reality: Musk's so - called "rock - bottom price" does not have an absolute generational advantage in the face of the price system of domestic large models that has already hit rock bottom.

My consistent view remains unchanged: In the current pattern of AI competition, you either need to achieve leading performance or extreme cost - performance. Models in the middle don't have much practical application value.

An even more fatal shortcoming is the context window, which is only 256K.

Today, when long - context models are emerging in an endless stream and a 1M window has become the standard for complex tasks, 256K seems woefully inadequate and even a bit ridiculous. This means that although Grok performs well when "building a project from scratch," once it enters a real project with hundreds of thousands of lines of code, it simply cannot hold enough historical context. The final result is naturally frequent hallucinations, poor instruction - following ability, and poor initiative.

At the same time, Musk still adopted the marketing strategy of "refusing to participate in benchmarks and only relying on showing orders" when releasing this model. However, the programming model Grok Code Fast 1 a year ago was frequently criticized. Although people's trust in the results of third - party evaluation agencies and benchmark tests is decreasing day by day, as mentioned before, benchmark tests are the "passing line" rather than the "excellent line." A release lacking the support of third - party tests will inevitably be suspected of over - packaging due to survivor bias.

04 Source Code Exploration

Also in the comment section of Kilo Code, a comment called on everyone to stay vigilant:

"Those who say that anyone can write code with AI are wrong. If you need something useful, you need to understand far more than just prompts."

If you deeply explore the source code generated by Grok Build 0.1 for just a few dollars, the result is not only a leap in productivity but also a game of security vulnerabilities.

Although the code engineering structure written by Grok is very standard, and it even thoughtfully configured the concurrent WAL mode of SQLite and a non - destructive retry mechanism, professional code review still picked out several fatal bugs:

1. In the most critical signature comparison stage of the Webhook, Grok defaulted to using ordinary string checks instead of crypto.timingSafeEqual, which is resistant to timing attacks. In the eyes of hackers, this is like an open door.

2. When querying the interface, Grok inadvertently leaked the key field (encryptedSecret) that should have been encrypted and saved. Although it was encrypted, this practice still completely violated the security specifications it set in its README when returning the API.

3. Grok wrote a total of 14 basic unit tests, but it did not provide effective solutions for a series of complex business logics such as the automatic pause mechanism and the integration test of the retry loop. It was avoiding the important and dwelling on the trivial.

This is a very valuable alarm bell for global AI developers and development enterprises, which confirms two things:

First, AI will not eliminate programmers but will force programmers to become more strict "technical reviewers." If developers really think that they can write a multi - million - level architecture just by a textual description, then the cost saved by using Grok for a few dollars will definitely turn into the cost of thousands of times of security patches and system reconstruction.

Second, zero - threshold programming does not mean that everyone can be a programmer, does not mean that they can develop a runnable application, and even less means that they can achieve commercial value. Suppose there is an outsider who doesn't understand program development at all and is just keen on the term AI Coding. Then, for the above - mentioned vulnerabilities of Grok, they probably won't be able to understand any of them, let alone fix and improve them. And these bugs are exactly what must be eliminated in the process of achieving commercial value.

05 Conclusion

Overall, the release of Grok Build 0.1 and the actual test by Kilo Code are an extremely successful publicity for xAI.

It precisely hits the developers' seemingly unrealistic fantasy of "cheap, useful, understanding engineering architecture, and being able to debug autonomously" and proves that Musk does have the ability to compete in the vertical programming field. For foreign developers who need to quickly produce prototypes and verify logic, it is currently the most handy tool.

But if it wants to become the "American programming version of DeepSeek" or reshape the global programming model ranking list, there is still a long way to go.

In the second half of the global AI competition, which has entered the deep - water area, a simple price war cannot maintain the moat forever. Whether it can handle ultra - long contexts, accurately reconstruct in complex legacy code, and strictly adhere to the security bottom line while generating code is the key to whether xAI can counter - attack the "Big Three."

Musk has fired this shot, but the bullet still needs to fly for a while.

At least for now, even if the demand is solved for just a few dollars, users still have to sit back in front of the computer and carefully check whether every line of code will be exploited by hackers.

This article is from the WeChat official account "Silicon - based Starlight", author: Siqi. Republished by 36Kr with permission.