HomeArticle

Biology is undergoing a major change: Zuckerberg's new open-source model has completely dethroned Google's AlphaFold.

新智元2026-05-29 20:17
[Introduction] AlphaFold has dominated protein AI for years, but it has just been directly defeated! Biohub under Zuckerberg released ESMFold2, predicting 1.1 billion protein structures at once, 800 million more than AlphaFold, and it's completely open source!

The throne of AlphaFold is in jeopardy!

Nature published an article: Biohub, under Mark Zuckerberg, dropped a bombshell, releasing 1.1 billion protein structure predictions in one go, 800 million more than the AlphaFold database.

The AI model ESMFold2 behind it claims to outperform AlphaFold3 comprehensively in terms of performance.

More importantly, it is completely open - source and has no restrictions on commercial use.

https://www.nature.com/articles/d41586-026-01686-3

The dominant position of Google DeepMind in the protein AI field, which it has painstakingly built over the years, is being shaken by an open - source disruptor.

The landscape of the protein AI track may need to be rewritten.

1.1 billion protein structures are all on the table

On May 27th, Biohub, a biomedical institution founded by Mark Zuckerberg and his wife, officially launched a protein structure database called ESM Atlas.

It contains 1.1 billion predicted protein structures and an additional 6.8 billion protein sequence information.

The AlphaFold database has accumulated over 200 million structure predictions, and ESM Atlas comes with 800 million more.

The AI model that generates these predictions is called ESMFold2, developed under the leadership of Alex Rives, the scientific head of Biohub.

Rives said:

This atlas shows the full picture of protein biology, especially the most unknown parts.

Why is protein structure prediction important?

Proteins are the core components of life's operation. Knowing their shapes can help us understand their functions, and then design new drugs and overcome diseases.

AlphaFold won the Nobel Prize in Chemistry for this, which is a landmark case of AI changing science.

Now, a new model has emerged with a dataset five times larger.

What makes ESMFold2 strong as an AI model?

ESMFold2 takes a different technical route from AlphaFold.

It is built on the "protein language model" released in 2024. The core idea borrows from the practices in the NLP field, treating protein sequences as "languages" to understand. It is trained on billions of protein data, enabling the model to directly predict three - dimensional structures from sequences.

AI peers of AlphaFold should find this familiar, as it is the same logic as large language models learning human languages.

The coverage of training data is a key variable.

ESMFold2 incorporates a large amount of microbial protein data from environments such as soil and the ocean, which is blank in the AlphaFold database.

The wider the coverage, the more complete the "protein world" the model has seen.

The Biohub team claims that ESMFold2 performs better than AlphaFold3 in predicting the complex structures of protein - protein interactions.

But the most convincing thing is not the benchmark scores, but the real - world verification.

The team designed new proteins using ESMFold2, synthesized and tested them in the laboratory, and a high proportion of the designs worked as expected.

From "prediction" to "design" and then to "verification", once this chain is established, the value extends from papers to the real world.

Full open - source, this is the biggest killer weapon

The sharpest competitive weapon of ESMFold2 is that it is completely open - source and has no restrictions on commercial use.

The strategic significance of this choice becomes clearer when viewed in the context of the entire AI industry.

Although AlphaFold has an open database, AlphaFold3 had restrictions on commercial use at the beginning of its release.

The protein - protein interaction prediction model launched by Isomorphic Labs under Google DeepMind this year is completely closed - source.

Further reading: Google released "AlphaFold 4", no longer open - source! It outperforms the previous generation.

Ovchinnikov, a computational biologist at MIT, directly pointed out the value of open - source, "I expect many people will be very excited to try ESMFold2."

The leverage effect of open - source AI has been fully verified in the large language model track. Meta's Llama series is the best example.

A strong enough open - source model can leverage the global community to iterate, apply, and discover uses that the original developers didn't even think of.

The situation in the protein AI field is more special. There are a large number of laboratories and research institutions around the world that urgently need a free and unrestricted structure prediction tool. No matter how strong a closed - source model is, the user group it can reach is limited.

Biohub's choice of full open - source is in line with Meta's approach in the large language model field.

The strategy of the Zuckerberg - affiliated entities in the AI field is becoming clearer - using open - source as infrastructure and the ecosystem as a moat.

Do the industry experts buy it?

The academic community has a positive response, but also has clear reservations.

Gemma Atkinson from Lund University in Sweden said that ESM Atlas "should become an extraordinary resource for biology."

Christine Orengo from University College London recognized its value but emphasized that the prediction results need independent verification.

A more pointed question comes from Martin Steinegger at Seoul National University.

He is concerned about how ESMFold2 performs when facing "new structures" that are very different from known proteins.

His team previously found that the first version of ESMFold was not excellent in this regard. This problem remains unresolved for ESMFold2.

Ovchinnikov from MIT gave the most sober judgment. He believes that ESM Atlas is more suitable to be positioned as a supplement to the AlphaFold database.

He also pointed out that the closed - source model of Isomorphic Labs and some open - source models that Biohub didn't directly compare with have also achieved similar results.

The leading margin of ESMFold2 may not be as large as the paper implies.

This prudence reflects that the competition in the protein AI track has reached a white - hot stage.

Open - source, closed - source, academic, and commercial models are all iterating at an extremely fast pace.

Today's "strongest" may be surpassed in half a year. This rhythm is very similar to the arms race in the large language model track.

When AI starts to read the source code of life

In the past, it might take months to years of laboratory work to analyze the three - dimensional structure of a protein.

AlphaFold proved for the first time that AI can do it in a few minutes.

Now, ESMFold2 has pushed the prediction scale to 1.1 billion, covering a large number of proteins that have never been analyzed before.

Looking forward along this path, when AI can accurately predict all protein structures, design new functional proteins, and the experiments verify their effectiveness, the arrival of AGI in the field of life science may be closer than most people think.

If ASI truly arrives, biology will no longer be a subject that needs to be "studied" for it, but a system that can be "engineered".

Design life at the molecular level, customize proteins on demand, and rewrite the rules of evolution.

This sounds like science fiction, but tools like ESMFold2 are gradually turning "science fiction" into an "engineering problem".

Today, 1.1 billion protein structures are laid out on the table, and any scientist with an internet connection around the world can access them for free.

This means that AI's ability to understand life has taken another step forward.

Reference: https://www.nature.com/articles/d41586-026-01686-3 

This article is from the WeChat official account "New Intelligence Yuan", author: Revelation of ASI; editor: Marco. It is published by 36Kr with authorization.