Why has the "Y database," which involves collecting male blood and is known as a powerful tool for solving crimes, sparked controversies?
In September 2025, a notice issued by the Public Security Bureau of Xilinhot City, Inner Mongolia, hit the hot search: Starting from September 5th, blood samples of male residents in the jurisdiction began to be collected, with the purpose of building the Y database (full name: "Y database pedigree craftsman system"). The collection follows the principle of voluntariness, and residents are encouraged to cooperate.
As soon as this news came out, it attracted a lot of attention and discussion on the Internet. In fact, Xilinhot is not the first city to start the Y database construction work. In the past two decades, many counties and cities in Henan, Fujian, Yunnan and other places have carried out blood sample collection for the Y database. And this work may be promoted in more places in the future.
Xilinhot collects male blood samples for Y database construction | Weibo @China Newsweek
In the news, the Y database, this "powerful tool for solving crimes", is always associated with the detection of old unsolved cases and major homicides. It can work wonders when conventional means yield nothing. Even if the suspect's information is not in the database, the police can still use it to locate the target.
So what exactly is the Y database? What secrets are hidden in a drop of blood? Where does the controversy it causes lie?
The Y chromosome is passed down from father to son almost unchanged
Among the 23 pairs of chromosomes in humans, one pair determines gender and is called the "sex chromosome". When the sex chromosomes are XX, the individual is female; when they are XY, the individual is male.
Half of the offspring's chromosomes come from the father and the other half from the mother, but parents do not pass on half of their chromosomes intact: the genetic information in most chromosomes will be recombined and rearranged in advance, undergoing "genetic recombination"; sometimes, the genetic information may also be "miswritten", "omitted" or "overwritten", resulting in "gene mutation".
Therefore, each new life is unique. The genes are highly similar to those of the parents, but not exactly the same - this is the source of human diversity.
However, the Y chromosome is relatively special. Not only because it is only passed down between fathers and sons, but also because when passed on to offspring, about 95% of the content on the Y chromosome does not undergo recombination like other chromosomes. Except for occasional mutations, it is passed down from father to son, and then to grandson, from generation to generation, almost unchanged.
Therefore, male members of the same paternal family often have almost the same Y chromosome information. This stable paternal inheritance characteristic has made researchers notice the possibility of confirming paternal kinship by comparing Y chromosome information.
Male members of the same paternal family often have almost the same Y chromosome information | Tuchong Creative
However, human genetic information is extremely vast. If a person's entire genes are compared to a book, it would have as many as 3 billion words, and about 99% of the content is almost exactly the same between people. Just for the Y chromosome, it is also a huge work of more than 58 million words. It would be time-consuming and laborious to compare it word by word from beginning to end.
So, how about changing the way of thinking: Instead of comparing the entire content of the Y chromosome, focus on those special fragments. On the male Y chromosome, some fragments are long chains composed of the same short sequence repeated multiple times and connected end to end, such as "...TGGA TGGA TGGA...".
Such fragments are called Y chromosome short tandem repeats, abbreviated as Y-STR.
Research has found that the number of repetitions of the short sequence in each Y-STR varies to a certain extent among different individuals. When comparing the characteristics of multiple Y-STRs together, the differences become more significant. Such a combination pattern is called "Y-STR haplotype".
Due to the particularity of Y chromosome inheritance, the Y-STR haplotype is generally passed down from generation to generation intact. That is to say, the Y-STR haplotype is almost the same among male members of the same paternal family, but there are obvious differences between different families. Therefore, forensic scholars can judge whether a sample belongs to a certain paternal family by comparing the Y-STR haplotype. This is the Y-STR testing technology.
Forensic scholars can judge whether a sample belongs to a certain paternal family by comparing the Y-STR haplotype | dnacenter.in
Since German scholars first discovered and reported Y-STR in 1992, it has been quickly introduced into the field of forensic science. Now, researchers have identified thousands of Y-STR loci.
Currently, commonly used commercial detection kits usually contain 17 - 21 standard Y-STR markers to accurately distinguish different paternal families. And if further distinction between close relatives and distant relatives is needed, another kind of marker called rapidly mutating Y-STR is required. This marker mutates at a higher frequency than standard Y-STR. In this way, the differences between close relatives are small, but as generations pass and mutations accumulate, the differences between distant relatives will become more and more obvious.
Therefore, after confirming that two samples belong to the same paternal family, forensic doctors can estimate the degree of kinship according to the difference in rapidly mutating Y-STR between the two samples. Some high-precision kits will use a small number of rapidly mutating Y-STR markers at the same time to enhance the discrimination ability within the family.
Locking the scope, the Y database shows its prowess
In the criminal cases judged in China, the proportion of male criminals exceeds 90%. Since Y-STR only exists in males, it can confirm whether there is male DNA in the sample and extract the male's genetic information from the mixed sample.
Especially in sexual assault cases, the vaginal swab sample often contains female vaginal epithelial cells and secretions, while the content of male DNA is relatively small. Conventional detection is easily interfered with, while Y-STR detection can accurately identify male suspects.
In addition to confirming paternal kinship, it can also be used to infer the ethnic group and geographical origin of males. This is because males from different regions or ethnic groups usually have specific Y-STR haplotype distributions, which reflect the history of human migration and differentiation.
These advantages enable the Y-STR testing technology to bring breakthrough progress to case investigations when the case reaches a deadlock - when traditional investigation means cannot provide valuable information, by indicating the suspect's ethnic origin and kinship.
In 1999, 16-year-old Dutch girl Marianne Vaatstra was sexually assaulted and brutally killed on her way home.
The police extracted DNA from the semen traces at the scene and conducted a comparison, but found nothing in the database. At that time, the newly established database only contained the information of a few hundred criminals across the country. After arresting 12 suspects successively and collecting the DNA of more than 160 males for comparison, the police still couldn't find a sample that matched the crime scene. The case investigation was in vain. Angry local residents even pointed the finger at the Middle Eastern refugees resettled in the area, and the town and even the whole country were plunged into riots and conflicts.
In 1999, 16-year-old Dutch girl Marianne Vaatstra was killed | telegraaf.nl
The case then fell into a long stagnation. It wasn't until more than 10 years later, with the application of the Y-STR testing technology, that there was finally a turning point in the detection of this murder case. The police used this technology to quickly confirm that the murderer was a local resident from the crime scene area and ruled out the possibility of a Middle Eastern refugee committing the crime, narrowing the scope of the investigation.
Subsequently, the police invited males within a 5-kilometer radius of the crime scene to participate in the investigation. Using the samples voluntarily provided by most people, they quickly locked down the suspect's family through the Y-STR testing technology and finally successfully arrested the suspect many years after the crime.
The Y-STR testing technology has also made great contributions in China. In the "Campaign to Solve Long-standing Homicide Cases" across the country in 2020, more than 5,000 cases were solved, and nearly half of them were long-standing cases that had been unsolved for more than 20 years. The detection of these long-standing homicide cases was often contributed to by the Y-STR testing technology, such as the "Serial Murder Case in Baiyin, Gansu" and the "Case of the Murdered Female Student at Nanjing Medical University" that shocked the whole country.
In addition to criminal investigations, the advantages of Y-STR technology in paternal family and population evolution and migration also make it widely used in the search for missing persons, combating child trafficking, identifying the identities of disaster victims, paternity disputes, as well as archaeological and anthropological research.
Not omnipotent, a successful match may be a coincidence
However, Y-STR testing is not omnipotent. It has its own limitations.
Naturally, it is useless when facing female samples. In addition, due to the limited number of detectable loci, males from different families may match at some Y-STR loci, resulting in a "false positive". And some highly mutable loci may have significant differences among close relatives, resulting in a "false exclusion".
Therefore, the selection of Y-STR loci needs to consider the following aspects.
First of all, it should have a suitable mutation rate. If there is no mutation or the mutation is too slow, different families may maintain the same characteristics, and the discriminatory power will be lost; on the contrary, if it is very prone to mutation, there will be obvious differences even between fathers and sons, and it cannot be used to determine kinship either.
Secondly, it should also have relatively good individual differences. That is, the number of repetitions of the internal short sequence should have various forms of change, and each situation should be relatively evenly distributed among different populations.
In addition, reliability and stability are also crucial. Its sequence is not prone to produce false signals during detection and can be accurately identified in complex samples or degraded DNA.
Moreover, its discriminatory ability is limited in a mixed sample of multiple males. Therefore, the test results must be carefully interpreted and analyzed in combination with other evidence. Otherwise, it may mislead the investigation direction of the case and even lead to misjudgment and unjust cases.
For example, in the "Chen Longqi Case" in Taiwan, China in 2009, two women were sexually assaulted by multiple men while in an unconscious state after drinking. The police used a kit containing 17 Y-STR markers to analyze the semen stains to be tested. The results showed that the sample matched three men, including Chen Longqi. Although Chen Longqi insisted that he left early that night and was never involved in the case, the prosecution determined that the DNA evidence was sufficient and sentenced him to 4 years in prison.
It wasn't until the retrial in 2013 that the forensic experts used an updated kit containing 23 Y-STR markers to re-test and successfully excluded Chen Longqi's involvement, clearing his name.
It took 4 years for Chen Longqi to clear his name | jrf.org.tw
In fact, how much effect the Y-STR testing can have depends on whether there is a large and diverse enough database.
This is because in the identification, forensic doctors cannot directly determine that two samples belong to the same paternal family just because the sample has the same haplotype as that in the database. They must also explain a key question: What is the probability that if another person is randomly selected from the entire population, he will have the same Y-STR haplotype? That is to say, is the successful match really just a coincidence?
The answer to this question is expressed by the gene diversity in Y-STR testing. The larger the value of gene diversity, the lower the possibility of the same haplotype appearing. The higher the possibility that the two successfully matched samples belong to the same paternal family. And to accurately calculate the gene diversity, it is necessary to master the real occurrence frequency of each Y-STR haplotype in the population as much as possible.
Since the Y chromosome hardly undergoes recombination, different paternal families have their own unique Y-STR haplotypes, making each haplotype relatively rare. In order to accurately estimate the distribution frequency of each haplotype in the population, a large and diverse database is needed to ensure the reliability of the statistical conclusion.
If the sample size of the database is insufficient, problems may occur. For example, some haplotypes may not be included, and their occurrence probability cannot be estimated. Or, a haplotype that is very common in the north may only appear once in the database of a certain place in the south and is easily mistaken for a rare haplotype; conversely, a haplotype that is extremely rare across the country may be mistaken for a common haplotype if it happens to appear concentratedly in a certain place.
The controversy over the database, how to use it safely?
With the application of DNA analysis technologies including Y-STR testing around the world, especially the implementation of DNA collection, the potential ethical controversies and privacy risks have also attracted increasing attention.
The core question is, while using genetic data to investigate crimes and maintain public safety, will it, and to what extent, violate the rights of innocent citizens?
Current research believes that Y-STR usually does not affect gene expression and does not directly determine an individual's morphological, physiological or behavioral characteristics. Therefore, the personal privacy information it contains is relatively limited. We cannot infer a person's eye color, height, blood type or health status based on their Y-STR. However, when the detection scope is extended from Y-STR to the entire Y chromosome, or even the whole genome, the analysis results may reveal certain traits, thus allowing us to infer some characteristics and identity information of the sample.
Therefore, in the face of potential risks, strict access rights and protection systems need to be established for the database, and legislation should be used to prevent information comparison beyond the purpose of criminal investigation, illegal access and malicious tampering.
In the face of potential risks, strict