StartseiteArtikel

New cover of Nature: A new member of Google's Alpha series, instantly comprehends the ultimate blueprint of life

账号已注销2026-01-29 08:24
AlphaGenome, it's not just about "reading" the genome.

A non-coding DNA sequence consisting of up to one million base pairs, once as inscrutable as an ancient text, can now have its regulatory functions and the impacts of mutations precisely predicted. A rare mutation associated with cancer, which once made the cause of the disease a mystery, can now have the entire process of its activation of disease-causing genes directly revealed through sequence analysis.

This is all thanks to the unified DNA sequence model AlphaGenome, which graced the latest cover of the authoritative scientific journal Nature. It is a new addition to Google DeepMind's Alpha series of AIs.

AlphaGenome is a deep learning model capable of uniformly predicting the functions of long DNA sequences. It breaks through many limitations of previous technologies and can process DNA sequences up to one million base pairs in length at once, and precisely predict thousands of molecular signals related to gene regulation with extremely high resolution.

What's even more exciting is that AlphaGenome can not only 'understand' the genome but also evaluate the impacts of genetic mutations on various biological processes in just one second.

Paper link: https://www.nature.com/articles/s41586-025-10014-0

To advance scientific research, AlphaGenome is now open to the global scientific community: Researchers can conduct non-commercial research through the AlphaGenome API and access the model code and weights on GitHub.

This breakthrough has attracted high attention from the academic community. Robert Goldstone, the head of genomics at the Francis Crick Institute, commented: AlphaGenome is an important milestone in the field of genomic artificial intelligence. Its high-resolution prediction of non-coding DNA turns the technology from theoretical interest to practical application, enabling scientists to programmatically study and simulate the genetic roots of complex diseases.

Fergal Martin, the head of the Eukaryotic Annotation Team at the EMBL European Bioinformatics Institute (EMBL-EBI), also said that this model can not only accelerate the interpretation of human genomic differences but also is expected to expand to the DNA interpretation of more species such as plants, animals, and microorganisms in the future.

The emergence of this achievement marks a crucial step in deciphering the 'regulatory code' of the genome. This model is not only a powerful tool for analyzing the pathogenic mechanisms of non-coding mutations but also will accelerate the processes of rare disease diagnosis, drug development, and synthetic biology, opening up new possibilities for future life science research.

AlphaGenome: Opening Up New Possibilities for Life Science Research

AlphaGenome is a brand-new artificial intelligence deep learning model. Different from previous tools focused on specific tasks, it is a general 'genome decoder' aiming to solve one of the most core problems in biology: understanding the functions of non-coding DNA sequences and the impacts of their mutations.

It has powerful core capabilities and can process ultra-long DNA sequences up to one million base pairs (1 Mb) at once. Based on human and mouse genomic data, it can simultaneously predict thousands of key molecular signals including gene expression, splicing, chromatin accessibility, and three-dimensional structure, providing a panoramic map of genomic functions.

Figure | Model overview. AlphaGenome can process 1 Mb DNA sequences and species identities (human/mouse), predict 5930 human or 1128 mouse genomic tracks, covering various cell types and 11 output types, and provide a specific resolution (far right).

AlphaGenome breaks through the technological bottlenecks that have long restricted the development of this field and demonstrates outstanding capabilities in rigorous performance tests.

1. Solving Two Core Trade-Offs

Sequence function prediction models based on deep learning have long faced two fundamental trade-off problems, and the emergence of AlphaGenome has completely changed this situation.

In the past, if a model wanted to capture the long-distance effects of distal regulatory elements (e.g., over 200 kb), it often had to sacrifice output resolution (128 bp or 32 bp intervals). Conversely, if it pursued high resolution at the single-base level, it could only process extremely short sequences. AlphaGenome successfully combines the two through an innovative model architecture - combining convolutional layers (to detect local patterns) and Transformers (to handle long-distance dependencies) and using distributed computing for efficient training on TPUs. It achieves single-base-level high-resolution prediction in the context of a 1 Mb ultra-long sequence with astonishing computational efficiency.

Currently, several SOTA models are highly specialized in a single modality, such as SpliceAI4 for splice site prediction, but these models usually have a narrow perspective. AlphaGenome integrates multi-modal prediction, long sequence context, and base-pair resolution in a single framework for the first time, directly models splice junctions, fills the gap in the analysis of splicing mutation details in existing technologies, and provides more comprehensive and in-depth biological insights.

2. Comprehensive Prediction Capabilities

The AlphaGenome model takes a 1 Mb DNA sequence as input and can accurately predict diverse genomic tracks across multiple cell types. In terms of specific prediction functions, AlphaGenome has made important breakthroughs. Especially in the field of splicing prediction, it introduces a new method for predicting splice junctions and combines the prediction of splice site usage. This dual prediction ability provides a more detailed and comprehensive analysis perspective on the RNA splicing process.

Figure | Comparison of the prediction outputs of deep learning models. Except for Borzoi (32 bp), all models predict at a 1 bp resolution. Borzoi implicitly predicts splice sites through RNA-seq coverage, while other models generate explicit prediction results.

The comprehensiveness of AlphaGenome is not only reflected in the diversity of prediction modalities but also in its efficiency in handling mutation effects.

In addition to predicting various molecular characteristics, AlphaGenome can also efficiently evaluate the impacts of genetic mutations on all characteristics in one second. It comprehensively scores all prediction modalities by efficiently comparing the prediction differences between the mutant sequence and the reference sequence. This efficiency makes it possible to screen pathogenic mutations on a large scale across the entire genome.

3. Excellent Performance, Ranking First in Multiple Benchmark Tests as SOTA

A series of comprehensive benchmark tests have strictly evaluated this model, covering its ability to accurately predict genomic tracks on unseen DNA sequences and the effectiveness of mutation effect prediction tasks. When predicting a single DNA sequence, AlphaGenome reached the current SOTA level in 22 out of 24 genomic track prediction tasks; when predicting the regulatory effects of mutations, it achieved SOTA performance in 25 out of 26 prediction evaluations.

Figure | The bar chart shows the relative improvement of AlphaGenome in selected DNA sequence and mutation effect tasks and compares the results with the current best methods in each category.

It is worth noting that many of the models that have been surpassed are highly specialized models for single tasks. AlphaGenome's overtaking in multiple fields fully demonstrates the powerful potential of its general architecture. Scientists no longer need to maintain multiple model toolkits and can efficiently handle various prediction tasks with just AlphaGenome.

Application Prospects of AlphaGenome

AlphaGenome opens up broad prospects for future life science research and medical applications. Its powerful capabilities will play a key role in the following three major fields:

1. Disease Understanding and Diagnosis

In the field of rare disease diagnosis, AlphaGenome can provide strong functional evidence for existing mutation annotation processes. Especially for those non-coding variants of uncertain significance (VUS), the model can help researchers more accurately judge their pathogenicity, thus opening up new diagnostic avenues for patients with rare diseases (such as Mendelian genetic diseases).

For complex diseases such as cancer, AlphaGenome can reveal how non-coding region mutations lead to diseases through complex regulatory networks. For example, in the study of T-cell acute lymphoblastic leukemia (T-ALL), AlphaGenome successfully simulated the process in which a specific mutation introduced a MYB binding motif and then activated the oncogene TAL1. This ability helps scientists discover new therapeutic targets and gain a deeper understanding of the biological roots of diseases.

In this regard, Marc Mansour, a clinical professor of pediatric hematology - oncology at University College London, commented that AlphaGenome has brought a leapfrog improvement in identifying driving non-coding mutations, significantly shortening the relevant research cycle from months to weeks. It is not only crucial for cancer research but also will play a key role in multiple cutting - edge fields such as disease feature association analysis, synthetic biology, and functional genomics.

2. Synthetic Biology and Gene Therapy

The prediction ability of AlphaGenome can be transformed into design ability, guiding scientists to create synthetic DNA with specific functions. For example, researchers can design 'tissue - specific enhancers' that are only activated in nerve cells and remain silent in muscle cells, thus achieving precise gene regulation.

In the field of gene therapy, the real - time prediction function of the model can be directly used to optimize therapeutic tools. When designing therapeutic antisense oligonucleotides (ASO), AlphaGenome can predict their modification effects, helping researchers screen out the safest and most effective candidate drugs and accelerating the development process of new therapies.

3. Accelerator for Basic Research

AlphaGenome can quickly generate scientific hypotheses and predict the functions of a large number of gene sequences, helping researchers prioritize the most likely successful experimental subjects before conducting expensive experiments, thus significantly improving research efficiency.

At the same time, AlphaGenome can also predict and verify the functional characteristics of brand - new DNA sequences created by generative models, thus enhancing the capabilities of generative models trained on DNA sequences and jointly pushing the boundaries of biological discovery.

Limitations and Future Outlook

Although AlphaGenome has made significant progress in deciphering the genomic regulatory code, as a cutting - edge scientific research tool, it still faces some challenges.

For example, accurately capturing the impacts of extremely distal regulatory elements more than 100,000 base pairs away from the target gene remains an unsolved problem. These long - distance interactions are crucial for the correct expression of genes, but they are extremely difficult to model.

Although the model can predict the tracks of hundreds of cell types, there is still room for improvement in precisely reproducing the subtle regulatory patterns unique to cells and tissues under different physiological, pathological, or developmental states and accurately predicting the effects of mutations in these specific environments.

In addition, AlphaGenome has limitations in species and modality coverage. Currently, the training data and evaluation mainly focus on humans and mice, and its generalization ability to more species needs to be expanded. At the same time, its predictions mainly focus on modalities related to protein - coding genes, and the coverage of non - coding RNAs (such as microRNAs) and other fields can be enriched.

Currently, the model has not been benchmark - tested for personal genome prediction, which is a common weak point for models in this field. Since AlphaGenome only predicts the molecular effects of mutations, and complex traits and diseases often involve broader biological processes such as development and environment, which are beyond the direct'sequence - function' association scope of the model, its direct application in complex trait analysis is limited.

To address these limitations, the Google DeepMind team will promote the development of multiple research directions in future work.

For example, they will increase the diversity of input genomes by detecting more species or large - scale perturbation of non - coding regulatory elements. This will help build the next - generation mutation effect prediction model and improve its generalization ability.

At the same time, improving the accuracy and practicality of mutation prediction is another key direction, such as through task - specific calibration, fine - tuning on perturbation data sets, or integrating single - cell data. The research team will also integrate a wider range of data modalities into the model, such as DNA methylation and RNA structural features, to provide a more comprehensive biological perspective.

In addition, combining the prediction results of AlphaGenome with other mutation effect evaluation indicators (such as conservation - based scores) and existing gene function and biological pathway data will help advance in - depth analysis of common and rare mutations and achieve cross - validation of multidisciplinary knowledge.

Future foundation models will also explore the use of DNA language models, expand multi - species capabilities, and develop robust detection bias correction methods. In addition, evaluating the certainty of the model will also help better interpret the prediction results, thus laying a more solid foundation for interpreting the complex cellular processes encoded in DNA sequences.

This article is from the WeChat public account "Academic Headlines" (ID: SciTouTiao), author: Wang Yueran, published by 36Kr with authorization.