Meta's Major Open-Source Release: First Code World Model Puts "World of Runnable Code" into AI, Enabling AI to Think Like a Programmer

Will we enter a new era of software development?

Early this morning, the Meta FAIR team made a major release of the Code World Model (CWM) - an open-weighted LLM with 32 billion parameters and support for a maximum context of 131,000 tokens.

According to Meta's official introduction, CWM has a clear goal: to bring the concept of the "world model" into code generation and reasoning, enabling the model not only to write code but also to simulate the code execution process, reason about program states, and self-detect and fix bugs.

Notably, to support the community's research in the direction of the "code world model", Meta also open-sourced the weight checkpoints of CWM at the mid-training, SFT, and RL stages. For this reason, Meta's Chief AI Officer, Alexandr Wang, also called on X: "We encourage the research community to conduct research on this open-weighted model!"

Why bring the "world model" into the code domain?

At the beginning of the CWM research paper, the Meta team mentioned that the current traditional code pre-training treats code as static text, and the model mainly learns to predict code line by line from left to right and top to bottom - in other words, the model learns syntax, common patterns, and naming conventions, but does not understand the execution process.

"We believe this is not enough - to truly master coding, one must not only understand what the code looks like but also understand what it does when executed."

This skill is crucial for software engineers' daily work: at the local level, they understand how the execution of a line of code changes the state of local variables; at the global level, they can predict how changes in the codebase will affect the program's output.

Based on this, the core idea of CWM is to bring the concept of the "world model" into the code domain, allowing the model to learn the observe - act - observe execution trajectory, thereby improving the executability, verifiability, and self-repair ability of the generated code.

How to go from "looking at code" to "looking at the world"?

As mentioned at the beginning, CWM is an LLM with 32 billion parameters and support for an ultra-long context (up to 131,000 tokens). For this purpose, it uses a local + global alternating mechanism and long-sequence stabilization techniques. The training is divided into three major stages:

● Pre-training stage: Use a large-scale general corpus and code corpus (about 8 trillion tokens, with code accounting for about 30%) to lay the foundation for the model. The early context length is 8,000 tokens.

● Mid-training stage: Introduce 5 trillion tokens of world modeling data and extend the context length to 131,000 tokens. This step is also the core of "internalizing" the world model ability.

● Post-training stage (SFT + RL): First, conduct SFT (100 billion tokens, 32,000 context) to strengthen the instruction and reasoning abilities, and then conduct large-scale multi-task multi-round RL (172 billion tokens, 131,000 context). The training objectives cover verifiable coding, algorithmic problems, and software engineering interactions.

According to the Meta team, the world model ability of CWM is mainly driven by two types of data in the mid-training stage:

(1) Python execution traces: Serialize the intermediate stack frames and local variable states of function or test executions in the interpreter into the format of observation → action → observation and feed them to the model in a dedicated format, allowing the model to learn to predict "how the next execution will change the local state".

It is understood that the data of this type fed to CWM by Meta covers function-level, competition problem solutions, and repository unit test traces, with a very large total volume. By learning this type of data, the model can simulate the code execution path without a real running environment.

(2) Interaction traces between the agent and the environment: Use an automated agent to "forage" in an executable repository mirror - execute Bash, edit files, run tests, and fix bugs or implement missing functions. It is reported that the amount of such data collected by Meta is also very large: "About 3 million traces were collected from 102,000 images and 31,500 underlying repositories."

This dynamic trajectory data directly puts the interaction experience between the agent and the environment into the mid-training in advance, helping the model learn the coding idea of "using tools to repair software", which is particularly helpful for multi-round software engineering tasks.

In addition, Meta also shared two engineering details of CWM in the post-training stage:

First, introduce specific "reasoning tokens" in the SFT stage to help the model distinguish between direct answers and reasoning processes; in the RL stage, use the more flexible tag to encourage the model to form its own reasoning path; second, use a bootstrapping strategy, that is, feed the high-quality trajectories generated by the early RL model back to SFT to form a virtuous cycle, which can steadily improve the agentic ability and reduce the training noise of RL.

CWM performs strongly in benchmark tests

In the benchmark tests published in the Meta paper, CWM performs strongly in code repair and math problems:

● On SWE-bench Verified, CWM can reach 65.8% pass@1 when Test-Time-Scaling (multiple candidates and voting) is enabled, and 53.9% when it is not enabled.

● It also has impressive results on benchmarks such as LiveCodeBench, Math-500, and AIME: 68.6% on LiveCodeBench; 96.6% on Math-500; and 76.0% on AIME 2024.

Taking the score on SWE-bench Verified (this test requires the AI model to repair actual errors in GitHub projects) as an example, it can be seen that CWM not only outperforms other open-source models with similar parameter counts but can even compete with larger or closed-source LLMs and is approaching the level of GPT-4.

However, Meta also admits that CWM is not flawless. Since it is not a general chat model, there are still gaps in certain editing formats or multi-language scenarios; moreover, a large amount of agentic training may introduce "formatting noise", which needs to be alleviated by screening and bootstrapping means.

Praise and doubts from the industry

Judging from the popularity on social platforms, the release of CWM today has clearly attracted wide attention - after all, this is the first model launched by Meta after its high-profile reorganization of its AI business.

In addition to Alexandr Wang mentioned at the beginning, many Meta AI researchers have promoted and shared it. For example, Gabriel Synnaeve, a senior core contributor to CWM, briefly reviewed the research ideas of CWM; Yann LeCun also retweeted Gabriel Synnaeve's post and summarized it in one sentence:

"Code World Model (CWM): Generate code by imagining the effects of executing instructions and planning instructions to produce the desired effects."

Meanwhile, the industry generally expresses curiosity and welcome for the release of CWM. In particular, they appreciate that Meta not only open-sources the final model but also makes public the checkpoints from the mid-training to the SFT and RL stages - which is extremely useful for academic and engineering reproduction and is commendable in the current environment where many companies adopt closed strategies.

However, along with the enthusiasm, there are also many practical doubts and concerns.

Many developers point out that they hope CWM can be independently compared face-to-face with existing code generation systems and be tested in a real development environment. In addition, since CWM has 32 billion parameters and requires powerful computing capabilities, a lightweight variant may be more practical for daily developers. As the engineering team of CTOL.digital said: "CWM is a great research result, well-written, and has a bright future, but we still need to test it in practice."

So, what does the release of CWM mean? If AI can truly understand code execution and this becomes the norm, will we enter a new era of software development?

Reference links:

https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/

https://www.ctol.digital/news/meta-drops-ai-that-gets-how-code-works-shaking-silicon-valley/

This article is from the WeChat official account "CSDN", compiled by Zheng Liyuan, and published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Put the "world of runnable code" into AI. Meta makes a major open-source release of the first code world model: enabling AI to think like a programmer.

Why bring the "world model" into the code domain?

How to go from "looking at code" to "looking at the world"?

CWM performs strongly in benchmark tests

Praise and doubts from the industry