Künstliche Intelligenz entschlüsselt das Leben: Das "Superbeschleuniger" für Proteinforschung von Microsoft erscheint in Science.
The Microsoft "AI for Science" team has launched BioEmu, accelerating protein research by 100,000 times! From structure to function, from folding to mutation, this open - source tool is transforming the future of drug development.
BioEmu, the "simulation tool" for protein research from the Microsoft team, has made it onto Science today!
BioEmu can simulate various possible structural ensembles of proteins in equilibrium, providing crucial support for in - depth understanding of protein functions.
Paper link: https://www.science.org/doi/10.1126/science.adv9817
Our bodies are made up of tissues and cells. At the nanoscale, proteins are the tiny machines that drive life activities.
The Human Genome Project can sequence DNA. DNA contains segments called genes, which can be transcribed and translated into a string of amino acids, i.e., proteins.
Based on the amino acid sequence, proteins will fold into three - dimensional structures.
Experimentally determining protein structures is time - consuming, but the breakthrough of AlphaFold enables accurate prediction of protein structures.
There are scalable methods to determine protein sequences and structures, but understanding how they work remains a challenge.
What is the function of a protein? How is it related to the structure?
For example: Actin is a key protein in forming muscle fibers.
Like most proteins, the structure of actin is not fixed. When actin binds to ATP, it tends to be in a closed state.
Closed actin likes to bind to other actins to form fibers, which are the basis of muscles.
The biological functions of proteins depend on their ability to change conformations. Different conformations affect the binding of proteins to other proteins.
These conformations and the transitions between them can be studied through experiments or molecular dynamics simulations, but these methods are time - consuming and expensive.
Simulating the movement of a small protein for just one microsecond on a modern GPU takes a full two days, and hardly any significant movement can be observed.
Only by simulating for a longer time (e.g., milliseconds) can important functional changes such as folding, unfolding, or binding be observed, but this requires years of computing time and is difficult to apply on a large scale.
The Microsoft AI for Science research team has launched BioEMU.
When in use, simply input the protein sequence, and BioEMU can generate a large number of protein structure samples and predict various properties of proteins.
It can show the movement of a receptor protein between two known structures, predict large - scale structural changes, local unfolding, and the formation of drug - molecule binding sites.
BioEMU can also simulate the results of millisecond - level molecular dynamics simulations. While traditional simulations may take years of GPU time, BioEMU only needs less than one hour of GPU time, achieving a 100,000 - fold speed increase!
Netizens commented, "The breakthrough from Microsoft Research is exciting! Modeling the protein equilibrium ensemble at such a scale is of great significance for drug discovery and disease understanding. BioEmu condenses years of structural simulations into a few hours, which is a huge leap forward."
"I love science and the greatest inventors of all time, who are changing my life exponentially."
Simulating Protein Dynamic Structures
The functions of proteins are closely related to their dynamically changing structures.
They can flexibly switch between different shapes according to needs, and these changes are the basis for their functions.
BioEmu is a simulator that allows us to better understand how proteins work by predicting their structures in different states.
BioEmu 1.1 has undergone three - stage training over a longer period and with higher intensity, using a vast amount of data:
- Large - scale protein structure data;
- Over 200 milliseconds of molecular dynamics (MD) simulation data, equivalent to the computer - simulated movement trajectories of proteins;
- Over 500,000 protein stability measurement data.
Therefore, BioEmu 1.1 can more accurately predict protein behavior and capture structure changes related to functions.
Success rates for large - scale structural movements, local structure unfolding, and the formation of cryptic pockets have been significantly improved.
Ultra - fast Simulation with Extremely Low Error
BioEmu 1.1 can simulate the equilibrium distribution of molecular dynamics at the millisecond level at an extremely fast speed.
While traditional methods may take years of GPU time, BioEmu 1.1 can complete the task in just a few hours, greatly improving research efficiency.
BioEmu 1.1 performs excellently in predicting protein stability and mutation effects.
It makes the stability data measured experimentally more consistent with the simulated structural ensemble:
- The prediction error is less than 1 kcal/mol;
- In a large amount of test data, the correlation with the experimentally measured stability data exceeds 0.6;
- Even when the sequence similarity between the training data and the test data is about 50%, the prediction remains accurate.
By analyzing the structural samples, we can understand the impact of mutations on protein stability.
In addition, BioEmu 1.1 can accurately predict the stability changes of single and double mutations.
Even in the face of complex mutation situations, it can capture subtle differences through fine - grained data training and make reliable predictions.
BioEmu's training is based on a molecular dynamics simulation dataset of over 100 milliseconds, covering thousands of protein systems and tens of thousands of mutants.
This dataset has the advantages of sequence diversity and long - time simulation. With a large amount of high - quality data, it provides a solid foundation for BioEmu's excellent performance.
BioEMU opens the door for large - scale research on protein functions, facilitating drug discovery and protein design.
BioEMU is open - source (under the MIT license) and can be used on Azure AI Foundry and Colab Fold.
Developers can obtain the code from GitHub and the model weights from Hugging Face.
References:
https://x.com/MSFTResearch/status/1943373860012744737
https://www.science.org/doi/10.1126/science.adv9817
This article is from the WeChat official account "New Intelligence Yuan". Author: Ying Zhi. Republished by 36Kr with permission.