Apple macht die Tischdecke hoch, wirft die Kernmodule von AlphaFold weg und startet die Ära des "generativen KI" in der Proteinfaltung.
Protein folding has long been a central problem in computational virology and has a profound impact on fields such as drug development.
When comparing protein folding with a generative model in the field of image processing, the amino acid sequence corresponds to the "prompt", and the output of the model are the three - dimensional coordinates of the atoms.
Inspired by this idea, the researchers developed a universal and powerful architecture called SimpleFold, which is based on standard Transformer modules and adaptive layers.
Link to the study: https://arxiv.org/abs/2509.18480
How does SimpleFold differ from classical protein folding models like AlphaFold2?
AlphaFold2 and RoseTTAFold2 use complex and highly specialized architectures, such as triangular update, pair representations, and Multiple Sequence Alignment (MSA).
These designs often "hard - code" our existing understanding of the structure - building process into the model, rather than allowing the model to learn the generation method itself from the data.
In contrast, SimpleFold offers a completely new approach:
Without triangular update, pair representations, and MSA, but entirely based on a universal Transformer and Flow - Matching, it can directly map the protein sequence into a complete three - dimensional atomic structure (see Figure 1).
SimpleFold
The first protein folding model based on Transformer modules
Flow - Matching views generation as a journey over time and uses ordinary differential equations (ODEs) for trajectory integration. It is like developing a photo, where the noise is gradually transformed into a clear structure step by step.
SimpleFold also reproduces this process in protein folding:
The input is the amino acid sequence as a "prompt", and the output is a three - dimensional "photo" of all atoms, similar to tasks like "Text - to - Image" or "Text - to - 3D" in image processing.
Since AlphaFold2, components such as triangular update and the interaction between monomer and pair representations have often been used in protein folding models, but there is still no final opinion on whether these designs are really necessary.
SimpleFold brings a bold innovation approach in design and builds its architecture only on universal Transformer modules (see the comparison in Figure 5).
The architecture of SimpleFold consists of three parts: a lightweight atomic encoder, a heavy backbone, and a lightweight atomic decoder (see Figure 2).
This "fine - coarse - fine" approach, which first considers the micro - level, then captures the whole, and finally supplements the details, finds a good balance between speed and accuracy.
In contrast to previous methods, SimpleFold does not use pair representations and is not dependent on the initialization of attention by MSA or Protein Language Models (PLM).
Compared with works based on equivariant architectures, SimpleFold is entirely based on a non - equivariant Transformer.
To account for the rotational symmetry in protein structures, the researchers introduce SO(3) data augmentation during training, i.e., they randomly rotate the target structure and rely on the model to learn this symmetry.
Experimental Evaluation
To investigate the scalability of the SimpleFold framework in protein folding, the researchers trained a series of SimpleFold models of different sizes (including 100M, 360M, 700M, 1.1B, 1.6B, and 3B).
Increasing the model size does not just mean adding parameters. As the model size increases, the researchers also improved the entire channel of the atomic encoder, decoder, and the backbone network (see Table 5).
During training, the researchers adopted the strategy of AlphaFold2. They copy each protein Bc times on each GPU, select different time steps t respectively, and then accumulate the gradients over Bp proteins (see the detailed settings in Table 6).
The experiments show that this strategy provides more stable gradients and better model performance compared to directly randomly selecting proteins for a batch.
The researchers evaluated the performance of SimpleFold using two widely used benchmarks for protein structure prediction, CAMEO22 and CASP14.
These two benchmark tests place high demands on generalization ability, robustness, and atomic accuracy.
Table 1 summarizes the evaluation results on CASP14 and CAMEO22.
The researchers divided the models into two categories based on the way of extracting protein sequence information: methods based on MSA search (e.g., RoseTTAFold, RoseTTAFold2, and AlphaFold2) and methods based on Protein Language Models (PLM) (e.g., ESMFold and OmegaFold).
Moreover, they marked the baseline models according to whether their training objective is a generative objective (e.g., diffusion, Flow - Matching, or autoregressiveness) to distinguish whether they directly perform structure regression.
Interestingly, the models AlphaFlow and ESMFlow, which are derived from AlphaFold2 and ESMFold through fine - tuning on Flow - Matching, have overall worse results than their original regression models.
The researchers believe that this is because protein folding benchmarks like CAMEO22 and CASP14 usually only offer a single "true" structure target, which is more advantageous for regression models that make deterministic point predictions.
Despite its simple architecture, SimpleFold shows excellent performance.
In both benchmark tests, SimpleFold consistently outperforms ESMFlow, which is also based on Flow - Matching and builds on ESM - embeddings.
On CAMEO22, the performance of SimpleFold is comparable to that of the currently best models (e.g., ESMFold, RoseTTAFold2, and AlphaFold2).
More importantly, SimpleFold achieves over 95% of the performance of RF2/AF2 in most indicators even without triangular attention and MSA.
In the more challenging CASP14, SimpleFold even outperforms ESMFold.
SimpleFold shows less performance degradation between the benchmarks, indicating that it can generalize robustly even without MSA and can handle more complex structure prediction tasks.
The researchers also reported the performance of SimpleFold models of different sizes.
Even the smallest SimpleFold - 100M achieves over 90% of the performance of ESMFold on CAMEO22, further showing that it is possible to build protein folding models based on universal structure modules.
As the model size increases, the performance of SimpleFold continuously improves in all indicators, showing that the universal and scalable architecture has significant advantages in the folding task.
Especially on the more challenging CASP14, the performance gain from increasing the model size is more significant.
Figure 3(a) shows an example of a structure with pLDDT predictions, where red and orange indicate low prediction confidence and blue indicates high prediction confidence.
It can be seen that SimpleFold predicts most secondary structures with high confidence, while it shows some uncertainty in flexible loop regions.
Figure 3(b) and (c) show a comparative analysis between pLDDT and the actual LDDT - Cα.
The ability of SimpleFold to generate structure ensembles
The advantage of a generative training objective is that SimpleFold can directly model the structure distribution instead of just outputting a single "final version".
Thus, it can generate both a deterministic structure and an ensemble of different conformations for the same amino acid sequence.
To validate this ability of SimpleFold, the researchers conducted tests on the ATLAS dataset.
This dataset is used to evaluate the generation of structure ensembles in molecular dynamics (MD) and contains the complete atomic MD simulation structures of 1390 proteins.
Table 2 shows the comparison results of SimpleFold with several baseline models on ATLAS (see Table 9 for SimpleFold models of different sizes).
The indicators used comprehensively evaluate the quality of the generated structure ensembles, including the prediction of flexibility, the accuracy of the distribution, and the observability of the ensemble.
As shown in Table 2, SimpleFold continuously outperforms ESMFlow - MD, which is also based on ESM - representations, in several evaluation indicators.
Moreover, SimpleFold outperforms AlphaFlow - MD in important observabilities such as exposed residues and the Mutual Information Matrix, which is helpful for discovering the "hidden pockets" common in drug development.
The researchers also evaluated the ability of SimpleFold to model the structure of proteins with naturally multiple conformational states.
As shown in Table 3, SimpleFold achieves on the Apo/holo - dataset... (The text seems incomplete here)