Lehren Sie KI, "Partner auszuwählen und Kinder zu zeugen", um die natürliche Evolution zu reproduzieren. Alumni der Shanghai Jiao Tong University nominieren die beste Dissertation.
Inspired by natural evolution, Sakana AI has proposed a brand - new model merging and evolution method called M2N2. By introducing the "mate - selection mechanism" from the natural world, AI can "compete, choose mates, and reproduce" just like living organisms. In the current situation where global computing power is in short supply and the actual scale of model training is restricted, Sakana AI has explored a new path for model merging with the inspiration from nature.
If AI models are allowed to evolve like living organisms, will they compete, cooperate, combine with each other, and give rise to increasingly powerful descendants?
Does the theory of evolution, "survival of the fittest through natural selection", also apply to AI models?
Recently, Sakana AI drew inspiration from the process of natural evolution and proposed a method that utilizes the natural selection mechanism of "competition and attraction" to enhance the effectiveness of AI model merging.
Sakana AI believes that the development of AI models is similar to the process of natural evolution:
Collective intelligence emerges from the group.
For example, nature did not create a single, gigantic organism but instead gave birth to a diverse ecosystem. In the natural ecosystem, each individual adapts to the environment and reproduces offspring through competition, cooperation, and combination.
This is exactly what Sakana envisions the AI world to be like:
What will happen when humans stop trying to build a single, large - scale AI and instead evolve an entire AI ecosystem where various specialized AI models compete, cooperate, and merge?
They didn't just stay at the stage of imagination. Instead, they have been exploring model merging, trying to use evolution to crack the "optimal formula" for existing model merging.
Now, they have made this "optimal formula" public!
Currently, the relevant research has been published at the GECCO 2025 conference and has been nominated for the Best Paper Award!
Paper URL: https://arxiv.org/abs/2508.16204
GitHub: https://github.com/SakanaAI/natural_niches
In previous model merging, manual intervention was required to manually define the way of model segmentation (e.g., by fixed layers or blocks).
Can this process run automatically, just like the evolution in the natural world?
Sakana AI proposed M2N2 (Model Merging of Natural Niches), which overcomes the above - mentioned difficulties.
This method is based on three key ideas from natural evolution:
- Evolutionary merging boundaries: M2N2 allows models to combine more freely, breaking the predefined static boundaries and greatly expanding the exploration space and possibilities for model combination. It's like the natural world exchanging variable - length DNA fragments instead of entire chromosomes.
- Diversity competition: M2N2 mimics the "jungle law" in the natural world, making models compete for limited resources (i.e., data points in the training set). This forces models to specialize and find their own "ecological niches", thus creating a population of diverse, high - performance experts and providing more excellent seed models for the "reproduction" of high - quality models.
- Mate - selection mechanism: M2N2 introduces an "attraction" heuristic method. It intelligently pairs and merges models based on their complementary advantages, that is, selecting partners that perform well in the other's weak areas. This significantly improves the efficiency of evolutionary search and greatly reduces the computational cost of model merging.
The results of this attempt are also encouraging: The M2N2 model merging technology has been successfully applied in model evolution and outperforms other evolutionary algorithms. For example:
- The MNIST classifier evolved from a random network performs as well as the CMA - ES algorithm but has higher computational efficiency.
- It can be extended to large pre - trained models. Especially in mathematical and online shopping tasks, the generated merged models perform significantly better than other methods.
- During the model merging process, it also avoids the problem of "catastrophic forgetting" in model fine - tuning.
This made netizen Aragon Dev exclaim:
"In 2025, agents found partners before I did."
M2N2: A Brand - New Model Evolution Method
M2N2 significantly enhances the effectiveness of model merging by introducing a brand - new evolution method that combines competition, attraction, and model merging with split points.
It is the first to apply model merging in training from scratch and outperforms all current evolutionary algorithms in terms of performance and computational efficiency.
After researchers extended M2N2 to LLMs and diffusion - based image generation models, it showed many advantages. For example, it can:
- Stably merge models and avoid catastrophic forgetting.
- Be compatible with models trained for different objectives.
- Reduce memory usage by avoiding gradient calculation.
- Retain model capabilities without the need for the original training data.
In model merging, the goal is to find the optimal parameters 𝜃∗ of the merged model among 𝐾 initial models to maximize the optimization objective, usually represented by the sum/average of task scores.
In M2N2, researchers modified the merging function ℎ to make the merging boundaries evolvable. At the same time, they adjusted the optimization objective to promote diverse solutions.
M2N2 eliminates the fixed model merging boundaries.
To break free from the constraints of fixed merging boundaries, researchers gradually expanded the search space by exploring a wider range of boundaries and coefficients. This approach of gradually introducing complexity not only broadens the possibilities but also keeps the computation controllable.
Competition for limited resources naturally promotes diversity.
Researchers encouraged diversity by modifying the optimization objective. By restricting the resource supply, M2N2 stimulates competition and naturally favors individuals that occupy new ecological niches.
Their specific approach is as follows:
Limit the total fitness that the population can extract from a certain sample 𝑥𝑗 to the capacity 𝑐𝑗.
The fitness that a candidate solution obtains from 𝑥𝑗 is proportional to its score's proportion in the total population score.
The modified objective is:
In biology, this combination (reproduction) is costly, so animals invest a lot of resources in the mate - selection process.
M2N2 additionally considers the complementarity between parent models. By gradually introducing complexity, it expands the explorable range while keeping the computation controllable.
Experiment 1: Evolving an MNIST Classifier
This experiment optimized a two - layer feed - forward neural network with a total of 19,210 parameters.
At the beginning, researchers randomly initialized the models.
For pre - trained models, researchers built two specialized models: one was trained on digits 0 - 4, and the other was trained on digits 5 - 9.
The results show that when starting from scratch, M2N2 has a significant advantage in test accuracy compared with other model merging methods (left of Figure 2).
For models trained from scratch, the split points and attraction scores have little impact. However, as shown on the right of Figure 2, when starting from pre - trained models, split points become crucial, and attraction can significantly improve performance throughout the training process.
In terms of diversity, the left of Figure 3 shows the proportion of training samples that are correctly labeled by at least one model in the library - training coverage.
The right of Figure 3 shows the evolution of population performance diversity during training:
If all models are correct or incorrect for the same sample, the entropy is 0 (no diversity); if the models are evenly split in prediction, the entropy reaches the maximum of 1.
From Figure 3, we can see that the model library of M2N2 quickly covers most of the training samples and maintains high coverage throughout the training process.
Figure 3 also shows the average entropy of all samples: The entropy of M2N2 rises rapidly in the early stage and then gradually decreases as low - performance models become extinct.
In contrast, MAP - Elites continuously increases diversity by retaining low - performance models but fails to achieve high coverage.
Overall, M2N2 maintains a model library with complementary advantages, which not only promotes effective merging but also systematically eliminates weak models as training progresses.
As shown in Figure 4, a smaller library starts better but converges to a worse solution faster.
This indicates that the library size should be expanded according to the planned number of forward passes.
It's worth noting that increasing the library size in the above figure does not increase the computational cost (the number of forward passes remains the same) but increases the memory usage. For very large models, the model library can be stored on disk instead of being resident in memory.
Experiment 2: Merging an LLM Math Expert and an Agent
In the experiment, researchers merged the math expert WizardMath - 7B - V1.0 with the agent environment expert AgentEvol - 7B, aiming to perform well on the math benchmark GSM8k and the web shopping benchmark WebShop.
The experimental results show that, as shown in Table 1, M2N2 scores the highest. Both the attraction and split - point techniques are crucial, and the split - point technique is a bit more important.
When merging math and agent skills, CMA - ES scores lower, possibly due to poor parameter partitioning, which emphasizes the necessity of incorporating merging boundaries in the optimization process.
As shown in Figure 5, the findings from MNIST can also be extended to LLM merging.
As shown in the left figure, the natural niche method maintains high training coverage; in the early stage when models explore different niches, the entropy rises (right figure); as low - performance models are removed and advantages are aggregated, the entropy gradually decreases.
In contrast, MAP - Elites focuses on maximizing entropy but sacrifices training efficiency and coverage because it retains low - performance models; GA quickly reduces coverage and entropy and "greedily" converges to its optimal solution, ultimately causing the entire library to "collapse" into a single solution with entropy close to zero.
Experiment 3: Merging Diffusion - Based Image Generation Models
In this experiment, researchers evaluated the performance of M2N2 in merging diverse text - to - image models.