Meta, University of Cambridge, and MIT's Full-Atom Diffusion Transformer Framework Selected for ICML 2025: First Unified Generation of Periodic and Non-Periodic Atomic Systems

The time required to generate 10,000 samples has been reduced from 2.5 hours to less than 20 minutes.

The joint research team from Meta FAIR, the University of Cambridge, and the Massachusetts Institute of Technology proposed the All-atom Diffusion Transformer (ADiT), which breaks the modeling barrier between periodic and non-periodic systems. Through two major innovations, the unified all-atom latent representation and the Transformer latent diffusion, it achieved a breakthrough in generating molecules and crystals with a single model.

In the cutting-edge fields of current scientific research and industrial applications, the generative modeling of the three-dimensional structure of atomic systems is showing disruptive potential, promising to completely reshape the reverse design landscape of new molecules and materials. From precise structure prediction to flexible conditional generation, the current state-of-the-art diffusion models and flow matching models have emerged in key tasks such as biomolecular analysis, new material research and development, and structure-based drug design, becoming the core tools for researchers to break through technological bottlenecks.

However, behind this booming field, a key problem has always restricted technological leapfrogging - the lack of cross-system generality in existing models. Although all atomic systems follow the same physical principles to determine their three-dimensional structures and interactions, the modeling of small molecules, biomolecules, crystals, and their composite systems has long been in a state of "divide and conquer." Most diffusion models highly depend on the inherent characteristics of specific systems and need to perform multi-modal generation on complex product manifolds intertwined with categorical data (such as atomic types) and continuous data (such as three-dimensional coordinates), which makes it difficult to be compatible between models of different systems.

Taking specific scenarios as an example: the de novo generation of small molecules needs to be split into two independent diffusion processes of atomic types (categorical) and three-dimensional coordinates (continuous). Although the denoising model needs to learn the co-evolution law of the two, it often reduces the sampling efficiency due to the distortion of intermediate states; biomolecular modeling needs to additionally introduce a rotation manifold and treat atomic groups as rigid bodies; while the diffusion process of crystals and materials must be compatible with periodic characteristics and run on a joint manifold composed of multi-dimensional parameters such as atomic types, fractional coordinates, and lattice parameters - these differences make the unified cross-system modeling a long-standing challenge in the field.

In this context, the joint research team from Meta's Fundamental AI Research (FAIR), the University of Cambridge, and the Massachusetts Institute of Technology proposed a breakthrough solution - the All-atom Diffusion Transformer (ADiT for short).

As a unified latent diffusion framework based on Transformer, the core advantage of ADiT lies in breaking the modeling barrier between periodic and non-periodic systems. Through the two major innovations of unified all-atom latent representation and Transformer latent diffusion, it can generate molecules and crystals with a single model. Its design hardly introduces inductive bias, making the training and inference efficiency of the autoencoder and diffusion model far exceed that of traditional equivariant diffusion models - under the same hardware conditions, the time to generate 10,000 samples is shortened from 2.5 hours to less than 20 minutes. More notably, when the model parameters are expanded to a scale of 500 million, its performance shows a predictable linear improvement. This characteristic lays a key foundation for building a general generative chemistry foundation model, marking a milestone step towards the general and large-scale application of atomic system modeling.

The relevant research results were selected for ICML 2025 under the title "All-atom Diffusion Transformers: Unified generative modelling of molecules and materials."

Research highlights:

* ADiT realizes for the first time the unification of generative models applicable to periodic materials and non-periodic molecular systems.

* ADiT relies on the unified all-atom latent representation and uses Transformer for latent diffusion, effectively simplifying the generation process with almost no inductive bias.

* ADiT has excellent scalability and efficiency, and its training and inference speed far exceed that of equivariant diffusion models.

Paper address:

https://go.hyper.ai/27d7U

Dataset: Covering experimental data from multiple fields, ranging from periodic to non-periodic

In this study, the research team first selected several representative datasets for experiments:

* The MP20 dataset contains 45,231 metastable crystal structures from the Materials Project, with a maximum of 20 atoms in the unit cell, covering 89 different elements, which can well represent periodic material systems.

* The QM9 dataset consists of 130,000 stable organic small molecules, containing a maximum of 9 heavy atoms (C, N, O, F) and hydrogen atoms, which is a typical representative of non-periodic molecular systems.

* The GEOM-DRUGS dataset contains 430,000 large organic molecules with a maximum of 180 atoms.

* The QMOF dataset contains 14,000 metal-organic framework structures.

Among them, MP20 and QM9 correspond to different types of atomic systems respectively, providing a basis for the joint training of the model on periodic and non-periodic systems. The research team divided the data in the same way as previous studies to ensure fairness in comparison with other models; GEOM-DRUGS and QMOF further expand the scope of model testing, enabling a more comprehensive test of the model's generalization ability.

ADiT: Building a unified atomic system generative model with two core ideas

As a latent diffusion model, the core design of ADiT revolves around two key ideas to achieve unified generative modeling of periodic and non-periodic atomic systems.

The first key idea is the unified all-atom latent representation. The research team regarded both periodic and non-periodic atomic systems as a collection of atoms in three-dimensional space, and then developed a unified representation method that includes the categorical attributes (such as atomic types) and continuous attributes (such as three-dimensional coordinates) of each atom. By training a variational autoencoder (VAE) for all-atom reconstruction, this encoder can embed molecules and crystals into a shared latent space, which builds a basic framework for the unified processing of different types of atomic systems.

The second key idea is to use Transformer for latent diffusion. In the latent space constructed by the VAE encoder, the research team introduced the diffusion Transformer (DiT) to carry out generative modeling work. During the inference process, with the help of the classifier-free guidance technology, new latent variables can be sampled, and these latent variables can be reconstructed into valid molecules or crystals through the VAE decoder, thus completing the transformation from the latent space to the actual atomic system.

Based on these two core ideas, the experimental method of ADiT is carried out in two stages in an orderly manner.

In the first stage, the researchers built an autoencoder for reconstruction. Through the VAE, they jointly reconstructed the all-atom representation of molecules and materials, learned and built a shared latent space - this is the prerequisite for the unified modeling of different atomic systems and lays the foundation for the subsequent generation process.

In the second stage, the researchers built a latent diffusion generative model. They used DiT to generate new samples from the latent space, and these samples were decoded into valid molecules or crystals through classifier-free guidance. The significant advantage of this latent diffusion design is that it transfers the complexity of processing categorical and continuous attributes to the autoencoder, making the generation process in the latent space simpler and more scalable, and effectively improving the efficiency and adaptability of the model in processing different atomic systems.

ADiT conducts generative modeling of chemical systems in two stages

ADiT leads in performance in crystal and molecule generation

To fully highlight the performance advantages of ADiT, the research team selected several types of baseline models for targeted comparison. In the field of crystal generation, the comparison objects include equivariant diffusion and flow matching models based on multi-modal product manifolds such as CDVAE, DiffCSP, and FlowMM, as well as the non-equivariant diffusion model UniMat and the two-stage framework FlowLLM; in the field of molecule generation, it was compared with models such as equivariant diffusion models, GeoLDM, and Symphony. Through systematic comparison with these advanced baseline models in the field, the performance advantages of ADiT were clearly demonstrated.

From the specific experimental results, ADiT reached the SOTA level in both crystal and molecule generation tasks. In terms of crystal generation, the crystals generated by ADiT performed excellently in key indicators such as validity, stability, uniqueness, and novelty. In the molecule generation task, ADiT ranked among the top in the validity and uniqueness indicators of 10,000 sampled molecules.

The joint training mechanism of ADiT also brought significant performance gains. The experimental data showed that ADiT trained on both the QM9 and MP20 datasets comprehensively outperformed the version trained on a single dataset in material and molecule generation tasks.

The expansion of the model scale has a predictable impact on the performance improvement of ADiT. As shown in the figure below, as the number of parameters of the DiT denoiser increased from 32 million (ADiT-S, blue) to 130 million (ADiT-B, orange), and then to 450 million (ADiT-L, green), even on a medium-scale dataset of about 130,000 samples, the diffusion training loss continued to decrease, and the validity ratio steadily increased, showing a significant scale effect. This strong correlation between model scale and performance indicates that by expanding the model parameters and data volume, it is expected to promote further breakthroughs of ADiT.

The impact of the increase in the number of ADiT denoising parameters on training loss and generation validity

In terms of efficiency, ADiT showed a significant speed advantage compared with equivariant diffusion models. As shown in the figure below, when generating 10,000 samples on an NVIDIA V100 GPU, ADiT based on the standard Transformer far exceeded FlowMM and GeoLDM, which use computationally intensive equivariant networks, in terms of the scalability of integration steps. Even though the parameter scale of ADiT-B is 100 times larger than that of the equivariant baseline, its inference speed is still faster, which highlights the advantage of the Transformer architecture in expanding practicality.

The time relationship diagram of ADiTs and equivariant diffusion models for generating 10,000 samples

In addition, the scalability of ADiT on larger systems was also verified. On the GEOM-DRUGS molecular dataset containing 430,000 molecules with a maximum of 180 atoms, compared with the most advanced equivariant diffusion and flow matching models, ADiT performed equivalently in terms of validity and PoseBusters indicators. It is worth noting that ADiT is based on the standard Transformer architecture, hardly introduces molecular inductive bias, and does not need to explicitly predict atomic bonds, but can achieve performance equivalent to that of equivariant models, further reflecting the generality and wide applicability of its design.

Joint efforts of industry and research to promote breakthrough innovation in the generation of three-dimensional structures of atomic systems

In fact, in the cutting-edge research field of generative modeling of the three-dimensional structure of atomic systems, the academic and industrial circles have made unremitting explorations, and many achievements have attracted much attention.

In the academic circle, the research team from the University of California, Berkeley, Microsoft Research, and Genentech jointly launched a multi-modal protein generation method called PLAID. This method cleverly uses the structural information in the pre-trained weights and uses DiT to perform denoising tasks. In the analysis of the structural quality and diversity of proteins of different lengths, it showed more excellent performance than other benchmark methods.

The industrial circle has also actively participated in the exploration in this field, driven by innovation. GeoFlow V2, the world's first all-round protein foundation model released by Chinese generative AI protein design innovation enterprise BioGeometry, built a unified atomic diffusion model architecture and solved the tasks of protein structure prediction and design at one stroke. In terms of antibody and antigen-antibody complex structure prediction, GeoFlow V2 comprehensively leads similar products with its extraordinary accuracy and speed. ByteDance's Seedance 1.0 adopted a technical solution that combines a variational autoencoder and a diffusion Transformer, realizing fast and efficient AI video generation. Its speed advantage opens up a new situation for real-time creation and interactive applications, indicating broad prospects in the field of commercial applications.

These academic research breakthroughs and industrial innovation practices jointly promote the development of the field of generative modeling of the three-dimensional structure of atomic systems. With the continuous progress of technology, this field will surely play a greater role in many aspects such as new material research and development and drug design, providing strong support for solving global scientific problems and industrial challenges.

Reference articles:

1.https://mp.weixin.qq.com/s/oF3-y7z8u1XpEtjd4q1u4w

2.https://mp.weixin.qq.com/s/tK0-1Qna6p7TnMrWENwZ7A

This article is from the WeChat official account "HyperAI Superneural", author: Tian Xiaoyao. Republished by 36Kr with authorization.