After 10 years, the 4D map of the human chromosome is released: uncovering the truth about DNA folding and exploring the causes of unknown genetic diseases
Last month, the 4D map of the human chromosome was officially released, marking another step forward in humanity's understanding of the mysteries of life.
If the human whole - genome sequence is like a comprehensive address book that tells us the names and locations of each gene, the Chromosome 4D Map Project (4D Nucleome, hereinafter referred to as 4DN) is like a social media feed bursting with information. It records how genes interact with each other in cells and how they collaborate at critical moments.
After a decade of this grand project focused on human self - research, a major phased achievement has finally been reached.
This project has gathered the efforts of many international universities and research institutions, such as Princeton University, the University of Pennsylvania, and Carnegie Mellon University. Teams from Fudan University and Zhejiang University in China have also participated. Finally, the researchers completed a joint paper co - signed by 90 authors, which was published in the top - tier journal Nature on December 18.
List of paper authors | Image source: Springer nature
In this study, the researchers systematically presented the three - dimensional structures of chromosomes in two representative types of human cells: human embryonic stem cells, which have strong differentiation ability, and fibroblasts, which have reached the end of differentiation and are fully formed.
Meanwhile, the researchers also introduced the fourth dimension - time, describing how the three - dimensional structure in cells changes over time.
Human embryonic stem cells (in the center) and mouse embryonic fibroblasts (in the periphery) that nourish them. There are huge differences in the morphology and structure of these two types of cells | wikipedia commons
This study has laid an important foundation for future related research and greatly deepened our understanding of the three - dimensional structure of chromosomes.
For example, it provides a new perspective for the diagnosis of genetic diseases: some congenital developmental abnormalities in newborns, such as cleft palate, cleft lip, polydactyly, or syndactyly with unknown mechanisms, may not be caused by changes in protein sequences but by errors in chromosome folding. Moreover, more diseases with currently unknown causes may be reinterpreted with the framework of 4DN, and our understanding of our own genes has reached a new level.
The 4DN project adds two dimensions to the gene sequence
At the beginning of this century, the human whole - genome map, completed through the cooperation of scientific research institutions from multiple countries around the world, was released. This project sequenced the complete human genome, allowing us for the first time to clearly know the arrangement order of bases on each chromosome and the approximate distribution of genes.
However, this map has a major flaw. It only records the two - dimensional sequence of flattened genes. In a real physiological environment, the structure of chromosomes is much more complex. DNA undergoes multiple foldings in the nucleus, forming complex three - dimensional structures. These structures also change with the cell state, which is the basis for the precise regulation of physiological activities in the nucleus and cannot be inferred from the base sequence alone.
Conceptual diagram of the 3D chromosome map, looking like crispy noodles | Springer nature
To further explain the changes of DNA in space and time, the 4DN project was born. Launched in 2014, it has produced more than 800 papers to date, accumulating a large amount of data, methods, and tools. Now, these achievements are converging into a crucial public resource: a more comprehensive 4D map of human chromosomes.
The DNA schematic diagram we see in junior high school textbooks, which looks like a twisted ladder, is actually two - dimensional, only recording the types and order of the four base sequences (A, T, C, G) on the chromosome.
Common DNA schematic diagram | healthline
In human cells, the genome is not like a string of beads. In fact, the genome will further fold, stack, and package, forming a three - dimensional chromatin and chromosome structure nested like a ball of wool.
Layered chromatin. The real situation is much more complex than what is shown in textbooks | wikipedia commons
Among these three - dimensional structures, researchers pay particular attention to a key unit - chromatin loops. Chromatin loops are closely related to genome function. Understanding these loops means understanding the long - distance regulatory mechanisms in the genome, enabling the three - dimensional structure to be correlated with the real physiological functions in cells, and making the two - dimensional genome come to life.
Gene expression is not a single - point switch. Whether a gene can be activated, when it is activated, and at what intensity are often determined by the cooperation of multiple distal regulatory elements. Just like in a workplace, starting a project usually requires the signatures of several leaders. Similarly, in the genome, to obtain the "signatures" of regulatory elements, a similar process occurs.
The target gene and regulatory elements may be hundreds of thousands or even millions of bases apart in the two - dimensional gene sequence. To make them work together, DNA needs to be folded, bringing originally distant segments close enough in space, thus forming chromatin loops.
To comprehensively and accurately identify all chromatin loops, researchers used multiple techniques to capture chromatin loops in the nucleus, identified and integrated the loops from multiple perspectives, and drew an accurate three - dimensional map of chromosomes.
Cohesin makes chromosomes form loops | wikipedia commons
So far, we have only upgraded the genome from two - dimensional to three - dimensional.
However, this three - dimensional structure is not fixed. As the cell cycle progresses, differentiation occurs, or external stimuli arrive, chromosomes will change their folding patterns and contact modes. Scientists hope to incorporate the dimension of how this structure changes over time or with cell states, which is the fourth dimension in the 4D map - time.
With the map established by researchers, we can find information that cannot be seen from the DNA sequence alone: for example, which genes will be folded close to each other in three - dimensional space, which regions of the nucleus they tend to be located in; and how the interaction relationships between these genes will be rearranged when the cell state changes, such as during different stages of DNA replication.
How is the 4D map constructed?
Researchers selected two representative cell types, which are the starting and ending points of cell development in the human body: undifferentiated human embryonic stem cells H1 - hESC and immortalized fibroblasts HFFc6 that have reached the end of development.
The technology for identifying chromosomes was invented by a group of researchers in the project more than 20 years ago. The purpose of this technology was to measure the contact frequency between a small number of DNA sequences in yeast. To be used in the Chromosome 4D Map Project, researchers optimized this method extensively to measure larger - scale and higher - resolution data.
Researchers captured chromatin loops in four steps.
Step 1: Fixation
Researchers first treated cells with chemical reagents, similar to the preservation step when processing fresh animal specimens, to fix the spatial state of chromatin at that time and prevent it from dispersing during subsequent operations.
Step 2: Fragmentation
Then, they used "scissors" to cut chromosomes into many small fragments. Generally, the finer the cutting, the more accurate the subsequent positioning.
Step 3: In - situ splicing
With DNA cut, ligase was added to splice these fragments in situ. Fragments that were closer to each other in the nucleus were more likely to be glued together.
Step 4: Sequencing and reading
Finally, the spliced products were sequenced. Researchers would find a key piece of evidence: the first half of a sequence came from position A in the genome, while the second half came from a distant position B, indicating that A and B were very close to each other in the nucleus. After aggregating a large amount of such evidence, the so - called chromatin loops were captured.
The core process of chromatin loop sequencing - fixation, fragmentation, ligation, sequencing and reading. This figure shows the Hi - C sequencing method. However, the core process remains the same regardless of the method | Paper
There is more than one sequencing method for the 3D genome, and each method is good at capturing different types of loops. In the paper, researchers used 7 sequencing methods to comprehensively capture chromatin loops as much as possible.
After obtaining the raw data, researchers need to further clean and locate the data, just like washing vegetables in a kitchen. They screened out repeated and low - quality fragments and located the obtained fragments back to the genome.
Next comes the most critical step: finding truly reliable information from a large amount of pairing information. Researchers will further screen gene positions that are statistically significant and stable across different repetitions. Only those that meet these conditions will be recognized as reliable chromatin loops.
After such rounds of meticulous screening, the research team finally constructed an astonishingly large loop catalog in each of the two cell types: approximately 140,000 chromatin loop data were identified and cataloged in each cell type.
With these large and accurate data, researchers can not only more completely depict the chromosomal environment where a gene is located but also infer which distal regulatory elements or other genes it may interact with and further locate and understand key genetic processes in the three - dimensional structure of chromatin.
The 4DN model can directly help us correlate cell structure with corresponding functions | Paper
Next, it's time to build the fourth dimension.
The "time" here is not like taking a continuous video of a cell from birth to death as we intuitively think. In this project, 4DN mainly introduces time into the map in two ways.
First, use DNA replication in the cell cycle as a natural time axis - DNA replication itself has a clear sequence. Researchers correlated three - dimensional structural features with replication timing data one by one, observing the relationship between different folding states and the replication process, as well as gene activation. This helps to understand the static three - dimensional structure in the context of the advancing cell cycle.
Second, start from single - cell differences and regard changes as part of the dynamics: By comparing differences between different cell types and the fluctuation range within the same cell type, researchers can determine which structural features are relatively stable and are housekeeping genes that will definitely be retained, and which are more flexible and will change with the cell's functional state or cell cycle stage.
Differences between cells: The POU3F1 gene is silenced (red) in embryonic stem cells and activated (gray) in fibroblasts | Paper
Now that we have the 4DN map, what's next?
Abnormal chromatin structure is closely related to congenital birth defects and cancer development. Understanding the dynamic 4D organization of DNA will help researchers figure out which genes are turned on or off due to changes in nuclear structure and how abnormal nuclear organization disrupts normal development and cell function, thereby causing human diseases.
For example, near the EPHA4 gene, there are specific structures to limit the scope of action of enhancers and precisely regulate gene expression. Once certain structural variations or mutations disrupt this mechanism, the enhancer that originally only drives EPHA4 may act on neighboring developmental genes by mistake, leading to varying degrees of limb developmental deformities, such as polydactyly or syndactyly.
Deformities caused by abnormal EPHA4 structure | Literature
The 4DN project is working hard to enable the biomedical research community to identify new targets for treating human diseases caused by abnormal nuclear organization. More diseases with unknown causes may find their causes and treatment methods through 4DN.
Therefore, the goal of the 4DN project is not only to build a map but also to enable the biomedical research community to find new targets for diseases caused by abnormal nuclear organization based on it. In the future, many diseases with currently unknown causes may be reinterpreted with the framework of 4DN - finding the real pathogenic links at the structural level and providing clues for more accurate diagnosis and treatment.
In the process of mapping the 4DN, many very practical "by - products" have also emerged. To integrate a large amount of data from different teams