HomeArticle

Based on 14,000 real data points, the University of Washington, Microsoft, etc. proposed GigaTIME to map a panoramic atlas of the tumor immune microenvironment.

超神经HyperAI2026-02-12 19:31
Multimodal AI generation for virtual population modeling of the tumor microenvironment

A research team composed of Microsoft Research, the University of Washington, and Providence Genomics has proposed the multimodal artificial intelligence framework GigaTIME. Relying on advanced multimodal learning technology, this framework can generate virtual mIF maps from conventional H&E slides. The research team applied it to a cohort of over 14,000 cancer patients from Providence Health Group in the United States, covering 24 cancer types and 306 subtypes. Ultimately, nearly 300,000 virtual mIF images were generated, achieving a systematic modeling of the tumor immune microenvironment in a large and diverse population.

In the evolutionary landscape of cancer, the tumor immune microenvironment not only dominates the growth, invasion, and metastasis of cancer cells but also profoundly affects the treatment response and the final prognosis of patients. This is not a "solo performance" of cancer cells but a highly dynamic ecosystem - various types of cells such as immune cells, fibroblasts, and endothelial cells interact here, all embedded in the extracellular matrix with reshaped structure and function, forming a precise and complex pathological network.

The key to deciphering this network lies in understanding the functional states and interactions of cells, and the activation levels of specific proteins are the important "molecular codes" among them. Traditionally, immunohistochemistry (IHC) technology, with its ability to intuitively show protein localization, has become a classic tool for decoding these codes. For example, PD-L1 staining has been widely used to identify the status of immune checkpoints to predict the efficacy of immunotherapy. However, IHC can only capture information of one protein at a time, making it difficult to restore the real ecosystem where multiple proteins coexist. This has become the main bottleneck for in - depth understanding of the dialogue mechanism between tumor and immune cells.

To break through this limitation, multiplex immunofluorescence (mIF) technology emerged. It can simultaneously present the spatial distribution of multiple proteins on a single tissue slice, fully retaining the contextual information of the tissue structure. However, this technology is costly and the process is cumbersome. It is extremely time - consuming from staining, imaging to analysis, resulting in difficulties in large - scale data accumulation and slow clinical translation.

In sharp contrast, H&E staining slides, which are widely available and low - cost in clinical practice. Although they cannot directly show protein activity, they completely retain the overall tissue structure and detailed cell morphology. The features hidden in them may indirectly reflect the functional states of cells, but these subtle and complex patterns often exceed the recognition limit of the human eye.

In recent years, breakthroughs in artificial intelligence technology have brought new opportunities. Through pre - training on a large number of pathological images, AI has demonstrated powerful visual analysis and feature mining capabilities. This leads to a key assumption: Can we use AI to "decode" the protein activation information that originally requires expensive mIF technology from easily accessible H&E images?

Based on this idea, a research team composed of Microsoft Research, the University of Washington, and Providence Genomics has proposed the multimodal artificial intelligence framework GigaTIME. Relying on advanced multimodal learning technology, it can generate virtual mIF maps from conventional H&E slides. The research team applied it to a cohort of over 14,000 cancer patients from Providence Health Group in the United States, covering 24 cancer types and 306 subtypes. Ultimately, nearly 300,000 virtual mIF images were generated, achieving a systematic modeling of the tumor immune microenvironment in a large and diverse population.

The relevant research results, titled "Multimodal AI generates virtual population for tumor microenvironment modeling", have been published in Cell.

Research Highlights:

* GigaTIME uses multimodal AI to convert H&E pathological slides into spatial proteomic data, generating a virtual population containing cell states from conventional H&E slides.

* It supports large - scale clinical discoveries and patient stratification and reveals new spatial and combinatorial protein activation patterns.

Paper Link:

https://www.cell.com/cell/fulltext/S0092-8674(25)01312-1

Dataset: Building a Complete Closed - Loop from Training to Application

To train the model, a fundamental contradiction needs to be resolved first: Clinically popular and low - cost H&E staining cannot directly show protein activity, while mIF technology, which can reveal the spatial relationship of multiple proteins, is too expensive and complex to be carried out on a large scale. To build an AI model connecting these two imaging technologies, the research team used the COMET platform to collect 441 mIF images from 21 H&E - stained slides. As shown in the figure below, these images cover a total of 21 key biomarkers, including nuclear proteins such as DAPI and PHH3, surface proteins such as CD4 and CD11c, and cytoplasmic proteins such as CD68, providing an important basis for analyzing the composition, functional states of immune cells, and the activities of tumor cells in the tumor microenvironment.

Data collection and channel distribution in the training data

After obtaining paired images, a greater challenge lies in how to extract high - quality training data from them. Therefore, as shown in the figure below, the team designed a rigorous processing procedure: First, the VALIS tool was used to accurately align H&E images with mIF images at the pixel level; then the StarDist algorithm was used to identify and segment each cell in the images; finally, the image regions with the highest registration quality were selected based on the Dice coefficient.

After strict quality control at multiple levels, the team selected 10 million high - quality cells from the initial data containing 40 million cells and divided them into a training set, a validation set, and an independent test set. In addition, breast cancer and brain cancer samples from tissue microarrays were introduced as an external validation set. These samples are significantly different from the training data in terms of tissue structure and morphology - they present as small cylindrical tissue blocks separated by blank areas, rather than large continuous tissue slices in the training data, effectively testing the generalization ability of the model when facing new sample types and unseen cancer types.

Pre - processing workflow of the training data

In terms of model application, the research constructed two large - scale and complementary virtual population cohorts. The first cohort comes from the clinical network of Providence Health Group in the United States, including H&E slides of 14,256 cancer patients from 51 hospitals and more than 1,000 clinics under its banner, covering 24 major cancer types and 306 subtypes. At the same time, it integrates rich clinical information such as genomic markers, pathological stages, and survival follow - up. The unique value of this dataset lies in its real - world characteristics: the patient population is highly diverse, and the disease stages span the entire spectrum from early to late, truly reflecting the complex situations in clinical practice.

The second cohort is taken from The Cancer Genome Atlas (TCGA) in the public database, including 10,200 H&E slides mainly from early - stage, untreated surgical samples. These two cohorts form a sharp contrast in terms of patient sources, disease stages, and clinical backgrounds. This differential design provides an excellent condition for verifying the reliability and universality of the model: If the model can draw consistent and robust biological conclusions from such different datasets, it will strongly prove its wide - ranging clinical application potential.

Data distribution of Providence Health by cancer type

GigaTIME: Building an Intelligent Bridge between Morphology and Function

The GigaTIME model directly addresses the key bottleneck in the research of the tumor immune microenvironment: The high - cost and low - throughput mIF technology is difficult to popularize, while the clinically routine H&E - stained images cannot directly reflect protein functional activity. This model uses artificial intelligence to learn to generate virtual mIF images from H&E images, providing a feasible path for low - cost and systematic analysis of the tumor immune microenvironment at the population scale.

The model adopts a carefully designed patch - based encoder - decoder framework, the core of which is built on a nested U - shaped network. The advantage of this architecture is that it can simultaneously capture the local subtle features and the global tissue structure of the image. Specifically, the encoder part of the network gradually extracts multi - level feature representations from the input 256×256 - pixel H&E image blocks through convolution and downsampling operations; the decoder part reconstructs these abstract features into virtual mIF images with spatial resolution through upsampling and feature fusion. This design enables the model to not only focus on the fine morphology of individual cells but also understand the tissue patterns of cell populations.

GigaTIME receives H&E whole - slide images and outputs mIF information covering 21 protein channels

In terms of output, the design of the model reflects in - depth consideration of biological problems. For 21 preset protein channels, GigaTIME performs binary classification prediction on each pixel in the input image, judging whether there is activation of a specific protein at that position, and finally generates a pixel - level protein activity map. These local prediction results can be seamlessly stitched together to restore the virtual mIF image of the entire tissue slice, further supporting the calculation of various quantitative indicators, such as the activation density and spatial distribution pattern of specific proteins in the tumor area, providing a solid data foundation for subsequent high - throughput analysis and clinical association research.

Virtual mIF scores of a large - scale population obtained through GigaTIME conversion

To ensure effective learning of the model, the training strategy has been systematically optimized. The loss function cleverly combines the Dice loss and the binary cross - entropy loss: The former focuses on ensuring the overall consistency of the predicted active area and the real area in the spatial contour, while the latter focuses on improving the accuracy of classification at each pixel point. The synergistic effect of the two ensures both the accurate restoration of the global spatial pattern and the reliability at the detail level. The model was fully trained for 250 epochs on 8 NVIDIA A100 GPUs, with a batch size of 16 and a learning rate of 0.0001. All key hyperparameters were determined through systematic debugging based on the performance on the validation set.

It should be particularly emphasized that the success of the model greatly depends on high - quality training data. The research team selected 10 million high - quality cells from a large amount of initial data through strict image registration, cell segmentation, and quality control processes, ensuring that the model learns a robust, reliable, and biologically meaningful cross - modal mapping relationship rather than superficial statistical rules or noise patterns.

Large - scale Discoveries Based on Nearly 300,000 Virtual Images: GigaTIME Reveals 1,234 Clinical Associations

To comprehensively evaluate the performance and value of GigaTIME, the research team designed a systematic evaluation scheme from two dimensions: technical verification and clinical discovery.

In technical verification, the research evaluated the image conversion ability of the model at three levels: pixel, cell, and slide. At the pixel level, GigaTIME significantly outperformed the baseline model CycleGAN in 15 out of 21 protein channels. For example, in the DAPI channel, the Dice coefficient of GigaTIME reached 0.72, far exceeding the 0.12 of the simple statistical baseline.

At the cell level, the correlation of GigaTIME in the DAPI channel reached 0.59, while that of CycleGAN was only 0.03, close to the random level.

At the slide level, the correlation coefficient of GigaTIME in the DAPI channel was as high as 0.98, with an average of 0.56 for all channels, while that of CycleGAN was close to 0. These results prove that supervised training based on high - quality paired data is crucial for accurate cross - modal conversion.

Performance of GigaTIME and CycleGAN in image translation

In terms of clinical discovery, the research used nearly 300,000 virtual mIF images of 14,256 patients to systematically analyze the associations between virtual protein expression and 20 clinical biomarkers. After strict statistical tests and multiple corrections, a total of 1,234 significant associations were identified, distributed at three levels: pan - cancer, cancer type, and cancer subtype.

GigaTIME enables biomedical discoveries at the pan - cancer, cancer type, and cancer subtype levels

Among the 175 associations in the pan - cancer analysis, high tumor mutation burden and high microsatellite instability were significantly correlated with the enhanced activation of multiple immune infiltration markers (CD138, CD20, CD68, CD4), which is consistent with the antigen - driven immune activation mechanism. At the same time, new clues were found: KMT2D mutations were strongly positively correlated with immune markers, suggesting that they may promote immune infiltration; while KRAS mutations were negatively correlated with immune markers, reflecting an immune - exclusion phenotype. In specific cancer types and subtypes, the model revealed a large number of specific associations. For example, the strong correlation between T - bet and TP53 mutations in brain cancer was not detected at the pan - cancer level, which may be related to the unique immune microenvironment of the central nervous system. The analysis of lung cancer subtypes showed that the association between PRKDC mutations and immune response markers