HomeArticle

Elon Musk played two aces, and Grok Build entered the field of AI programming.

新智元2026-05-26 18:22
xAI launches Grok Build to make up for programming shortcomings and will release V9 with 1.5T parameters for a showdown in June

[Introduction] The battle of AI programming is here! On May 14th, xAI launched Grok Build, a programming agent that runs in the terminal and can plan tasks and modify code by itself. Musk once admitted that xAI lagged behind in programming. This is its first step to catch up with Claude Code and OpenAI Codex head - on.

Musk has played his two cards to make up for xAI's programming short - board.

On May 14th, xAI released the early Beta version of Grok Build. The official positioned it as a "programming agent and CLI (Command - Line Interface) tool", which is first available to SuperGrok Heavy subscribers.

On the 25th, xAI officially announced Grok Build on its official website, and the access was expanded from SuperGrok Heavy to all SuperGrok and X Premium Plus users. So far, it has changed from a high - threshold and small - scale Beta version to a tool that more paying users can use.

https://x.ai/news/grok-build-cli

Musk publicly admitted before that xAI lagged behind in programming scenarios. According to Bloomberg, an executive within xAI once asked the team to make Grok catch up with Claude in various tasks. And Grok Build is the first product in this catch - up race.

But as soon as the product was launched, some users quickly raised a sharp question: although the product interface is good, the underlying model is not strong enough:

As long as xAI comes up with a truly SOTA (state - of - the - art) model, Grok Build can compete head - on with Codex and Claude Code overnight.

Musk then replied on X, revealing xAI's next - generation foundation model:

"The Grok V9 1.5T we recently completed runs very well. This is the result before adding Cursor data for supplementary training."

He also posted a special post to sort out xAI's "a bit confusing" version numbers: the internally developed V9, with 1.5T parameters, is significantly better than V8 in every aspect such as data organization, training methods, and model scale, and is optimized for the Blackwell architecture; while the publicly available v4.2, trained based on V8, has only 0.5T parameters and has major defects in the quality, comprehensiveness, and proportion of training data.

From V8 to V9, Musk used a word to describe it: "Huge gap."

Rebuild V9 in Three Dimensions

The "huge gap" Musk mentioned mainly comes from the reconstruction in three dimensions.

Parameter Scale

First, it is the parameter scale, which has tripled from 0.5T to 1.5T.

Expanding the number of parameters may improve the model's capacity, the ability to model complex tasks, and the stability in long - chain tool calls. However, the context window, repository - level understanding, and long - range task performance are not solely determined by the number of parameters.

Whether V9 can significantly improve in complex code repositories, cross - file modifications, and multi - step agent tasks still needs to be verified through actual tests after the public version is launched.

Since leading models such as Claude, GPT, and Gemini generally do not disclose the specific number of parameters, the 1.5T of V9 is more suitable as an indicator for xAI's inter - generational comparison and should not be directly used for horizontal comparison with Sonnet, Opus, or the GPT series.

Hopper is Not Enough, xAI Puts Blackwell into Use

According to Musk, V8, on which the public version v4.2 is based, has about 0.5T parameters and is trained on Hopper chips; while the internal V9 has expanded to 1.5T parameters and is optimized for the Blackwell architecture.

This means that the upgrade of V9 is not only about the increase in model size but also involves the generational switch of the underlying hardware platform.

Compared with Hopper, Blackwell is further designed for larger - scale model clusters, including higher inter - connection bandwidth, stronger low - precision computing capabilities, and system - level expansion capabilities for trillion - level model training and inference.

It should be noted that Hopper itself already supports FP8, and the new focus of Blackwell should be on FP4, the fifth - generation NVLink, and larger - scale cluster inter - connection.

xAI's computing power card is the Colossus super - cluster deployed in Memphis.

Musk has updated the expansion progress of Colossus several times in the past year.

This time, V9 is said to be optimized for Blackwell, which also means that xAI is trying to transform cluster expansion and hardware upgrade into the training and operation capabilities of the next - generation foundation model.

Real Developer Data

The second is data quality.

Musk said bluntly: The data quality of V8 has defects, it is not comprehensive enough, and the proportion is wrong. This means that V8 is not just a "smaller - scale model" but a model with a poor data foundation.

A core step in the supplementary training of V9 is to introduce Cursor data.

Musk specifically mentioned in a post on May 15th: The training of V9 has just been completed, and Cursor data has not been added yet and will be introduced in the supplementary training stage.

On May 17th, he updated the progress: The next step is to add Cursor data for supplementary training, followed by SFT (Supervised Fine - Tuning) and RL (Reinforcement Learning). The whole process will take about 3 to 4 weeks.

The value of Cursor data lies in that it is process data. There is a vast amount of code on GitHub, but it is final - state data.

The data generated from a blank file to the final code: developer completion, rollback, error correction, interaction with agents... These process data are the truly scarce resources for training programming agents.

After adding Cursor data in the supplementary training, V9 will be the first Grok systematically trained on real developer behavior.

Musk's mention of Cursor this time is not random. The relationship between xAI and Cursor has a long history: when grok - code - fast - 1 was released in 2025, Cursor was one of the cooperation platforms with limited - time free access.

As for the specific source and authorization details of the Cursor data used in the supplementary training of V9, there is currently no public information.

What Exactly is Grok Build

In terms of function, Grok Build is a CLI tool that runs in the terminal and can be installed with just one command.

https://x.ai/cli

In terms of function, it is a CLI tool that runs in the terminal and can be installed with just one command.

After installation, enter the project directory and type "grok", and it will start working. You can ask it to explain the structure of the entire code repository, or directly give it a task, such as "Add rate - limiting to this API". It will locate files, modify code, run tests, and correct its own errors by itself.

It has three usage methods. One is the TUI (Terminal User Interface), which is full - screen and can be operated with a mouse; the second is the headless mode, which can be integrated into scripts and automated processes; the third is to access other applications through ACP (Agent Client Protocol).

What really deserves attention is its working method: plan first, then execute.

Facing complex tasks, Grok Build will first write out a plan and stop to wait for your confirmation. You can make annotations item by item, rewrite the whole paragraph, or even let it ask you a question first and then start working. After the plan is approved, each change is presented in a clear diff (difference comparison), and it is clear what has been changed. When a programming agent directly operates on real project files, this review checkpoint is not dispensable.

It also supports parallel sub - agents, which can split a task among multiple sub - agents for separate processing; it supports the MCP (Model Context Protocol) server; it has an expansion system of skills, plugins, and a marketplace; the slash commands even include generating pictures with /imagine and generating videos with /imagine - video.

Putting all these together, Grok Build is comparable to the agents of Claude Code, Codex CLI, and Cursor, rather than traditional chatbots. What xAI has presented this time is a complete developer workflow entry.

The Base is grok - build - 0.1, and V9 is Not Launched Yet

According to xAI's official documentation, Grok Build is driven by a specially trained model: grok - build - 0.1.

It entered the early API access around May 19th, and the official positioned it as a "fast - coding model trained for agent programming".

This is a model specifically designed for programming. It supports text and image input, natively has tool - calling, structured output, and reasoning capabilities, and has a context window of 256K (about 256,000 tokens). It is trained to loop repeatedly in a long chain: read the problem, write code, use the terminal, check for errors, and correct errors.

grok - build - 0.1 has been separately listed on the xAI API. Developers can directly integrate it into their own agent loops or IDEs without necessarily using the Grok Build shell.

Its appearance also means a reorganization of xAI's programming model line. The earlier grok - code - fast - 1 was retired in mid - May. xAI officially recommended migrating code workloads to grok - build - 0.1, saying that the latter has significantly improved in agent programming and web development capabilities.

From a "fast and cheap code model" to a complete set of programming agent products, xAI is taking the path from the model API to the developer entry.

Currently, xAI's official model page still lists Grok 4.3 as the general main model, and only maps "programming" to Grok Build 0.1 in the usage table.

On the xAI official website's API page, the current main public model is still Grok 4.3. https://x.ai/api

The V8 and V9 mentioned by Musk are the generational numbers of xAI's foundation models, which are different from the public product version numbers: he referred to the public model as v4.2 in his post, while the xAI official website labels it as Grok 4.3.

grok - build - 0.1 is a dedicated programming - oriented model on another line. Whether it has the same base as V9 and whether it uses the pre - training results of V9, xAI has not publicly stated.

The Real Battle is in June

Completing the training of V9 does not mean its launch. After the whole post - training process is completed, the public version will have to wait for a few more weeks.

During these weeks, competitors will not be idle. Claude Code maintains a very high iteration density, completing bug fixes and releasing new versions in as fast as two days; OpenAI's Codex continues to make small progress; as an IDE platform, Cursor is also deepening its agent capabilities, expanding model options, and enhancing platform stickiness.

After V9 is launched, there are still issues to be verified. Tripling the parameters does not mean tripling the capabilities. Whether 1.5T can widen the gap with Claude and GPT in programming benchmarks remains to be seen in actual combat.

xAI has completed its product line. The training of V9 is completed, and the hardware is in place. In a few weeks, it will launch the public version and be tested by everyone.

xAI has shown its two cards, but it is not the only one at the table.

Claude Code, Codex, and Grok Build are about to face off, and Cursor is also continuously increasing its efforts.

Who will be the king will be revealed in the code combat in June.

Reference: https://x.com/elonmusk/status/2055914584373141906

This article is from the WeChat official account "New Intelligence Yuan", author: ASI Revelation; editor: Yuan Yu Moses. It is published by 36Kr with authorization.