Is GPT imitating humans? Nature finds that the brain is the earliest Transformer.
[Introduction] We used to think that language consists of grammar, rules, and structures. However, the latest research in Nature has shattered this illusion. The hierarchical structure of GPT turns out to be exactly the same as the “temporal imprints” in the human brain. When the shallow, middle, and deep layers light up in sequence in the brain, for the first time, we see that understanding language may never be about parsing, but rather prediction.
We have always firmly believed that the human brain understands language through a set of rigorous rules, grammar, and structural analysis, which is complex and unique.
This has been a “consensus” that has persisted for decades.
However, a recent groundbreaking study published in Nature Communications has completely overturned this age - old belief.
Paper link: https://www.nature.com/articles/s41467-025-65518-0
The researchers asked the subjects to listen to a 30 - minute story while using millisecond - level electroencephalogram (EEG) technology to precisely capture the brain's response to each word.
Then, they input the same story text into large language models, such as GPT - 2 and Llama - 2, and extracted the internal understanding representations of the text from each layer of the models.
An astonishing experimental result emerged:
The seemingly cold hierarchical structure of GPT actually has a perfect temporal correspondence in the human brain.
In the past, we always thought that GPT was imitating humans. However, this experiment gives a shocking hint:
Perhaps, our brains are naturally structured like “GPT”.
The structure of GPT can find a correspondence in the brain
To understand the groundbreaking nature of this research, we must clearly see its most crucial and sophisticated move: aligning the 48 - layer structure of GPT layer by layer with the time sequence of the human brain.
The research team recruited 9 patients undergoing pre - surgical monitoring for epilepsy, and high - density electrocorticogram (ECoG) electrodes had been implanted in their cerebral cortices.
Schematic diagram of ECoG implantation and positioning
This device can record the real electrical activities of the brain with millisecond - level precision.
The subjects listened to a 30 - minute podcast, and the researchers simultaneously collected high - gamma EEG signals around the appearance time of each word.
These signals cover key regions in the language pathway: from the middle and anterior superior temporal gyri (mSTG, aSTG) responsible for auditory perception, to the inferior frontal gyrus (IFG) responsible for language integration, and then to the temporal pole (TP) of the high - level semantic area.
Meanwhile, the researchers input the same text into GPT - 2 XL and Llama - 2.
Whenever the model processed a word, they “paused” and extracted all the internal understanding representations of this word from the first layer to the last layer within the model.
Subsequently, they simplified the representations of each layer of the model through canonical analysis (CA) dimensionality reduction and then used a linear model to attempt to predict the electrical activity of the human brain at that millisecond.
Schematic diagram of the research method: Each layer of GPT - 2 generates a semantic representation (left). The researchers input these semantic representations into a linear model to predict the EEG activity of the human brain when hearing the word (right). If the semantic representation of a certain layer can predict the EEG at a specific time point, it indicates that this layer corresponds to the brain processing stage at that moment.
If the internal hierarchical structure of GPT has nothing to do with the human brain, then aligning the model hierarchy with the brain's time axis will surely result in a chaotic mess without any order.
However, if there is indeed a certain structural correspondence between the two, we will see order on the EEG time axis.
And that's exactly what happened.
Arrange the model's layers into a “temporal ladder”
At the beginning of the experiment, the researchers only wanted to verify a simple hypothesis:
If the hierarchical structure of the large language model corresponds to a certain processing stage of the brain, then this correspondence should appear sequentially on the time axis, just like in a relay race.
The researchers put the semantic representation of each layer of GPT into a linear model, trying to predict at which millisecond the high - gamma activity of the brain would reach its peak when hearing each word.
Their assumption was that if the shallow, middle, and deep layers of the model undertake different language functions respectively, then they should also “stagger in time” and “appear in sequence” on the time axis of human brain activity.
As a result, this “temporal ladder diagram” clearly reveals the secret of the brain: the regions closer to high - level semantics are more like the deep structure of GPT.
The 48 layers of GPT present a clear “time - depth” correspondence structure in the brain's language pathway. The shallow layers (warm colors) reach their peaks at earlier time points, and the deep layers (cool colors) appear at later times. High - order regions such as TP, aSTG, and IFG show a strong linear relationship (r =.93 /.92 /.85), while the mSTG (near the auditory cortex) has almost no hierarchical structure (r≈0).
Because only the sound itself is processed here, and the semantics and structure of the language have not yet unfolded.
However, once it enters the aSTG, IFG, and TP, the curves are stretched out, presenting a neat distribution from shallow to deep.
Inside the key language area IFG, the hierarchical structure of GPT also shows a strong temporal correspondence. Left: The correlation distribution from the shallow to the deep layers (warm colors → cool colors). Right: The peaks of the shallow layers appear earlier, and those of the deep layers appear later, forming a regular temporal progression. The overall goodness - of - fit of IFG reaches r =.85 (p <.001).
A groundbreaking understanding gradually emerges:
It turns out that the brain doesn't understand language by first parsing grammar step by step and then disassembling words one by one.
What it really does is, like GPT, conduct layer - by - layer semantic inference and probability prediction.
And this highly complex inference rhythm actually perfectly coincides with the internal deep path of the large language model.
The clearer the pattern, the more embarrassing traditional linguistics becomes
If the hierarchical structure of GPT can really find a correspondence in the brain, then a more pointed question arises:
Do the traditional models that we think “most accurately” describe language - phonemes, morphemes, syntax, and semantics - show the same temporal structure in the language understanding framework they construct?
Traditional linguistic grammar tree
The research team included all four types of symbolic linguistic models in the test.
You know, their construction logic is textbook - level and has been the basic framework of linguistics and psycholinguistics for decades.
If human language really depends on these rules, then they should be able to predict the brain's response more accurately than GPT.
The result quickly gave an answer: The traditional symbolic models can indeed predict a part of the EEG activity, but they are far from being as “similar” to the brain's response as GPT.
On the same millisecond - level time axis, the prediction curves of these symbolic models don't show a clear “shallow - to - deep” and “early - to - late” sequential distribution.
They have no hierarchy and no temporal progression, as if lacking some kind of continuous and dynamic linguistic motivation.
In contrast, the embedded representations of GPT present a “flow - style” processing trajectory: meaning is continuously updated, compressed, and integrated over time, and each layer has its own position, like precise gears embedded in the context.
While the structure of the symbolic models is more like a static and discrete stack of labels, unable to provide a sufficiently detailed and dynamic mapping in the millisecond - level time dimension.
This undoubtedly leads to a groundbreaking conclusion: The language mechanism of the human brain is not a simple stack of symbolic rules, but a continuous and deep predictive processing process.
When we try to explain language with a grammar tree, the brain has actually completed dozens of layers of non - linear transformations; and these transformations are exactly the core capabilities that the Transformer model is best at.
Visual comparison: Neural network and the human brain
In other words, the symbolic models can tell us “what language is”, but GPT is more like showing “how the brain processes language”.
This is a real watershed. For the first time, there is such an obvious cognitive divergence between the explanatory framework of linguistics and the empirical results of neuroscience.
And on the side of the brain, speaking for the brain is the GPT that we originally thought was just “imitating humans”.
Language is not about rules, but a predictive ability
When the layers of GPT can find a clear temporal correspondence in the human brain, and when the symbolic linguistic models that have taken decades of painstaking efforts seem slow and disorderly in the face of millisecond - level EEG, the significance of this research goes beyond a simple model comparison.
It actually points to a more fundamental and ancient question: What exactly is language?
In the past few decades, we have used grammar rules to explain sentences, semantic networks to explain concepts, and tree - like structures to describe the logical relationships of language.
These frameworks emphasize “structure, category, and hierarchy” but rarely discuss the instant generation mode of language in the brain: how does it change continuously in milliseconds? How does it integrate the past and the future in an instant?
And the results of this study completely present a completely different picture -
The brain's processing of language is not like executing rules at all, but more like advancing along a track of continuous compression, prediction, and update.
The shallow layers are responsible for quickly extracting clues; the middle layers start to integrate the context; the deep layers construct longer meaning chains.
The whole process is not a static “grammar tree” but a forward - flowing computation.
This is exactly the “flowing structure” that the Transformer model was designed to capture through its multi - layer, non - linear, context - dependent, and time - rolling update characteristics.
Ironically, we always thought this was an invention of engineers. Now it seems that it is more like a computational path chosen by the brain itself in billions of years of evolution for efficient information processing.
This quietly changes the definition of language - Language is no longer a rule system, but a dynamic predictive mechanism.
When we understand a sentence, we don't first know its grammar and then match the meaning; instead, in every millisecond, we calculate “what might happen next”.
GPT is trained in this way.
Perhaps this is why, as we rely more and more on large language models, we always feel that they seem to understand us.
It's not because they have learned human rules, but because they accidentally match the rhythm of the human brain.
When the internal layers of GPT find a clear temporal correspondence in the brain, what we see is no longer the “victory” of a single AI model, but a structural convergence and a common destination of underlying computational rules.
The essence of language may never be static grammar rules, but continuous and dynamic prediction.
The brain uses this mechanism to understand the world and integrate information; the model uses this mechanism to generate language and simulate intelligence.
Ultimately, these two paths meet at the same efficient computational rule.
The familiar frameworks of linguistics and cognitive science may need a comprehensive update.
Understanding the internal structure of GPT may be exactly about re - understanding ourselves.
References:
https://