HomeArticle

OpenAI has open-sourced another model, which is only 0.4B, significantly slimming down the model.

智东西2025-12-15 16:10
99.9% of the weights are cleared, making the internal thinking of the large model "transparent".

On December 15th, ZDXX reported that yesterday, OpenAI open - sourced a new model called Circuit - Sparsity. The model has only 0.4B parameters, and 99.9% of its weights are zero.

Open - sourcing of Circuit - Sparsity (Source: Hugging Face)

This technology aims to solve the interpretability problem of the model. Simply put, it answers two questions: "Why does the model make this decision?" and "How does it arrive at this result?"

In today's era of rapid AI development, although large language models (LLMs) have shown amazing capabilities, their internal operating mechanisms remain like a mysterious "black box".

We don't know why it gives a certain answer, nor do we understand how it extracts knowledge from massive amounts of data. This lack of interpretability has become a major obstacle for AI in high - risk fields such as healthcare, finance, and law.

In response, the OpenAI research team trained a sparse - weight Transformer model, forcing 99.9% of the weights in the model's weight matrix to be zero, and only retaining 0.1% of non - zero weights.

In this study, the research team formed compact and readable "circuits" (Circuits) inside the model. Each circuit only retains the key nodes that ensure the model's performance, and the activation of neurons becomes semantically clear.

Some netizens overseas claim that this technology marks the end of the current MoE (Mixture of Experts) model. They said, "We've always isolated weights into 'experts' to roughly approximate sparsity, just to meet the requirements of dense matrix kernels."

Overseas evaluation (Source: X)

Some netizens even described this research as "slimming the model down to its skeleton". They also said that this research is like opening a black box. Instead of trying to untangle the dense model, it directly constructs a sparse model, which is the interesting part of this research.

Overseas evaluation (Source: X)

However, some netizens don't think so. They said they don't see why the MoE model would come to an end because of this. They further explained that this technology is for XAI (Explainable AI), and its training cost is 100 - 1000 times higher. Returning to the "research era" doesn't mean making things more complicated.

Overseas evaluation (Source: X)

Currently, this model is limited by the bottleneck of computational efficiency. Its operation speed is 100 to 1000 times slower than that of dense models. It is not feasible at present to directly apply this technology to cutting - edge large models with hundreds of billions of parameters.

Open - source address:

Github:

https://github.com/openai/circuit_sparsity

Hugging Face:

https://huggingface.co/openai/circuit - sparsity

01. Training a Sparse Transformer, OpenAI Clarifies the Internal Computation of the Model

To understand the breakthrough of this research, we first need to understand why traditional large models are difficult to interpret.

In standard dense models, there is a phenomenon called "superposition" in neural networks. Simply put, to store massive amounts of information, the model is forced to let a single neuron or weight matrix encode multiple completely different concepts simultaneously.

This feature entanglement leads to serious consequences, such as the model's decisions being untraceable and logically confused. When the model outputs a result, we can't determine which specific "concept" is at work.

In response to the above problems, previous research usually started by trying to disassemble the dense and entangled network. However, the OpenAI team adopted a "counter - intuitive" strategy, which is to train a sparse - weight Transformer model, forcing 99.9% of the weights in the model's weight matrix to be zero, and only retaining 0.1% of non - zero weights.

Forcing the model to limit the possible connections between its neurons to a minimum, this simple change almost fundamentally clarifies the internal computation of the model.

Each neuron is only connected to a few neurons in the next layer (Source: OpenAI Technical Blog)

The specific technical means include:

1. Dynamic Pruning and Sparse Constraints: During the training process, the system will dynamically perform the "pruning" operation. After each optimization step, only the weights with the largest absolute values are retained (Top - K sparsification).

2. Activation Sparsification: At key positions such as the residual stream and the attention key/value matrix, the research team introduced the AbsTopK activation function, forcing only the top 25% of activation values to be retained.

3. Architecture Fine - Tuning: To cooperate with sparsification, the research team replaced the traditional LayerNorm with RMSNorm to avoid the normalization operation destroying sparsity. At the same time, a "Bigram table" was introduced to handle simple pattern matching, thus freeing up the main capacity of the model to handle complex logical reasoning.

02. Compact and Readable "Circuits" Form Inside the Model, with a 16 - fold Reduction in Scale

The greatest achievement of this technology is the formation of compact and readable "circuits" (Circuits) inside the model.

In traditional dense models, thousands of nodes may need to work together to complete a task, and the logic is scattered and difficult to capture. In the sparse model, the research team observed a minimalistic computational path:

1. Minimalistic Logical Units: For example, when handling the "string closing" task, the model only used 12 nodes to build a perfect circuit, clearly showing how it detects whether single or double quotes are closed.

2. Readable Features: The activation of neurons becomes semantically clear. Researchers found that some neurons are specifically responsible for detecting "single quotes", while others act like "counters" to accurately track the nesting depth of lists.

3. A 16 - fold Reduction in Scale: Comparative experiments show that with the same task loss, the circuit scale of the sparse model is 16 times smaller than that of the dense model. This means that the difficulty of interpreting AI thinking is reduced by an entire order of magnitude.

The circuit scale of the sparse model is 16 times smaller than that of the dense model (Source: OpenAI Technical Paper)

To verify the authenticity of these circuits, the team conducted a "mean ablation" experiment. The results prove that removing non - circuit nodes has little impact on the task, while once the key nodes in the circuit are removed, the model's performance will collapse instantly. This confirms that these circuits are indeed the "necessary paths" for the model to perform tasks.

The "mean ablation" experiment (Source: OpenAI Technical Paper)

03. The Sparse Model Has Strong Interpretability but Is a Thousand Times Slower, OpenAI Proposes the "Bridge Network"

To measure the degree of decoupling of the sparse model's computation, the research team designed a simple algorithm task. For each model, they trimmed it into the smallest circuit that could still perform the task and checked the simplicity of the circuit.

The research team found that after training with a larger - scale and higher - sparsity model, a model with stronger performance can be built based on a simpler - structured circuit.

Comparison chart of the model's interpretability and capabilities (Source: OpenAI Technical Blog)

From the comparison chart of the model's interpretability and performance, it can be seen that with a fixed scale of the sparse model, increasing sparsity, that is, setting more weights to zero, will cause a slight decline in the model's performance but significantly enhance its interpretability.

Although the sparse model has outstanding advantages in interpretability, its application is currently limited by the bottleneck of computational efficiency: sparse matrix operations cannot be accelerated by Tensor Cores, and the operation speed is 100 to 1000 times slower than that of dense models. This means that it is not feasible at present to directly apply this technology to cutting - edge large models with hundreds of billions of parameters.

For this reason, the research team proposed the "Bridge Network" (Bridges) solution:

1. Encoding - Decoding Mapping: Insert an encoder - decoder pair between the sparse model and the pre - trained dense model.

2. Cross - Model Intervention: The encoder maps the activation of the dense model to the sparse space, and the decoder performs the reverse conversion.

The "Bridge Network" (Bridges) solution can modify a certain feature on the "transparent" sparse model and then map this perturbation back to the "black - box" dense model through the bridge, thereby achieving interpretable behavior editing of existing large models.

04. Conclusion: OpenAI Proposes a New Sparse Path, Leading Large Models from "Black Box" to "Interpretable"

This research by the OpenAI research team marks an important breakthrough in the field of AI interpretability and also confirms that understanding AI is not an unattainable goal.

The research team said in the paper blog that this work is an early exploration towards a more ambitious goal. Next, they plan to extend the relevant technology to larger - scale models and further explain the behavioral logic of more models.

To solve the problem of low training efficiency of sparse models, the team proposed two subsequent research directions: one is to extract sparse circuits from existing dense models, replacing the traditional way of "training a sparse model from scratch"; the other is to develop more efficient training techniques for interpretable models to promote the easier implementation of relevant technologies in production.

"Our goal is to gradually expand the scope of models that can be reliably interpreted and create relevant tools to make future AI systems easier to analyze, debug, and evaluate." the research team wrote in the paper blog.

This article is from the WeChat public account "ZDXX" (ID: zhidxcom), author: Wang Han, editor: Xin Yuan. Republished by 36Kr with permission.