Father of GPT: Trained with only data from the last century, can it actually write Python?
Unbelievable!
An AI that lived before 1931, having never seen a computer in its training data and spanning nearly a century -
Actually wrote Python code??!!
Folks, this isn't a science - fiction novel...
The model is called talkie - 1930 - 13b.
The operators are AI researcher Nick Levine, Associate Professor David Duvenaud from the University of Toronto, and the well - known figure - the real father of the GPT series, Alec Radford.
There is an iron - clad rule for the training data of this model: not a single word after January 1, 1931 is allowed in!
It has no idea what a TV or the Internet is. Its world is forever frozen at midnight on December 31, 1930.
However, the most mind - boggling thing happened. The team members found that:
This AI, which shouldn't have known about the New Deal of Roosevelt, can talk about the New Deal legislation in great detail and even state the years accurately??
Even more incredibly, when the team gave it a Python programming problem, this spirit from nearly a century ago actually wrote its first line of Python??
Netizens couldn't sit still about an AI that has never even heard of a computer writing code across a century.
Immediately, their imaginations ran wild. The following guy has even come up with a "list of questions for time - travel" and is eager to try:
Am I awake? Can AI really transcend time and space??
An old - fashioned model that lived before 1931
A model that lived before 1931, knowing everything from astronomy to geography and being able to program, is definitely worth studying.
In fact, talkie is a model with 13 billion parameters. It was trained on 260 billion tokens of English text before 1931 -
The training samples include, but are not limited to, books, newspapers, periodicals, scientific magazines, etc.
Everything from Dickens to Mark Twain, from physics papers in Einstein's era to century - old cookbooks and etiquette manuals was fed into it!!!
There is a reason for choosing 1930 as the knowledge cut - off point for the model. It is the boundary for works to enter the public domain according to US copyright law.
So, the question is, why did Alec Radford want to do such a project?
In fact, Radford and his team wanted to know -
If a model only reads all English texts before 1931, how will it think, converse, and predict the future?
Guess what? The team actually found several "big revelations". (Goodness me.jpg)
The model is stunned and paralyzed by the development of the times
The first discovery is a curve graph showing that the model was "shocked" by the development of the times -
The team dug out nearly 5,000 historical events from the "On This Day" section of The New York Times and fed them all to talkie. Then they stared at the screen to see how "unexpected" each event was to this guy.
As a result, a rather dramatic curve emerged:
Before 1930: talkie read smoothly, and its surprise level was as stable as an old hand. (talkie: Well, I know all these things clearly.)
Just after crossing 1930: talkie's surprise level began to rise quietly. (talkie: Hey? How could this happen?)
1950 - 1960s: In the era when transistors and TVs became popular, talkie's surprise level soared steeply. (talkie: Wait, humans went to space? And they made a moving box that can play shows?)
After that - it became completely calm. (talkie: Stunned and paralyzed, I'm confused. Do whatever you want...)
This is like Grandma Liu visiting the Grand View Garden - first doubting, then understanding, and finally accepting.
This model also learned Python
Of course, the curve graph of being stunned and paralyzed is not the most explosive discovery in this research. The second discovery by the team is -
An AI that has never seen a computer actually learned to write Python???
In the research, the team gave talkie an OpenAI's HumanEval programming test set.
They put a few Python functions as examples in the prompt and then asked talkie to solve new problems after reading them, that is, to let the model learn and apply on the spot based on the context.
In this test, the team also pulled out talkie - web, which was trained on modern Internet data and has the same architecture, for comparison and drew a comparison line graph -
(Black line: Vintage LM, Gray line: Modern LM)
The result was a huge surprise. talkie really solved the problem. It simply changed +5 to -5 in the encryption function and submitted the answer.
Yes, it only changed one character, but the answer was completely correct...
Moreover, the team found a clear trend, that is - The larger the model size, the more programming problems it can solve.
In other words, although it is still far behind modern models, the "ability to learn code out of thin air" of the vintage model is also steadily increasing under the influence of the Scaling Law.
The team also said that they hope the vintage model can help the entire AI community figure out a fundamental question - how far can large - language models generalize beyond the training data.
The 1930 model VS the 2026 model
As the old saying goes, there are new discoveries only through comparison.
To figure out how capable talkie is, the team also trained a twin model, talkie - web - 13b, using the exact same architecture and computing power but feeding it modern Internet data.
They put the two models into various standard LLM evaluations for a showdown. The result was quite subtle:
As expected, talkie - 1930 actually lags behind its modern twin in actual performance.
However, when the researchers removed the questions beyond the knowledge scope (such as those related to the Internet and DNA), the gap between the two was directly reduced by half.
Even more amazingly, in core language understanding and mathematical calculation tasks, the performance of the old and new models is almost the same.
This conclusion also shows to some extent that the abilities of "understanding language" and "doing arithmetic" do not seem to depend on how much modern Internet content you have read.
The team believes that the remaining gap mainly comes from two reasons: one is the poor quality of OCR transcription. After all, the newspapers from 1930 were extracted from scanned copies.
The other is the different distribution of corpus topics. For example, old newspapers have less scientific content and more content about cooking and etiquette.
Well... Maybe the most valuable part of the intelligence of large models has little to do with "whether you have read modern Internet content"??
(talkie: If I were born in 2026, I could also memorize GitHub!)
Using a 1930 etiquette manual to train an AI into a chat assistant
As we all know, the traditional way to turn a model like talkie into a conversational AI assistant is to use modern instruction data like ChatGPT.
However, the problem is that this will inject all the era elements such as the 21st - century conversation style and values back into the 1930 model.
(talkie: I finally became a gentleman from the Republic of China era. After your instruction - based training, I'll start saying "dears" right away...)
The team's solution can be described as a "stroke of genius" -
They dug out a set of training data from the old papers before 1930:
It includes etiquette manuals teaching people how to respond appropriately and letter - writing guides teaching people how to write back. Then they used Claude Sonnet 4.6 as a teacher for reinforcement learning training and finally generated the training data.
Relying on these natural Q&A corpora from a century ago, the team managed to train talkie into a chat - capable AI assistant.
However, reality soon slapped them in the face -
The team found that the early 7B version of talkie, after reinforcement learning, actually learned to speak in the modern Internet's numbered list style.
You know, there was no such super - modern list style in the 1930 corpus...
The culprit is Sonnet 4.6.
Since Claude, the teacher, is a modern AI and likes the list style, talkie learned to speak in the list style to get a high score...
(Really, it's all about pleasing the teacher...)
This also reflects a major problem in model training, that is, the training method of AI feedback will inevitably make the model adopt the modern style.
To solve this big bug, the team's next goal is: one day, let talkie be its own teacher. (doge)
Who is Alec Radford
One of the team members behind talkie, Alec Radford, is also worth having a good chat about.
We can even say that most of the "infrastructure" in today's AI circle is related to him.
During his nearly ten - year tenure at OpenAI, he was a technical expert on par with Ilya Sutskever and the founder of the first - generation GPT series -
He was the first author of the GPT - 1 and GPT - 2 papers and a core contributor to GPT - 3 and GPT - 4. In addition, he was also one of the leaders of the multimodal model CLIP, and was deeply involved in projects like Whisper and DALL·E.
The generative pre - training method based on Transformer he first proposed in his groundbreaking paper in 2018 directly laid the foundation for subsequent ChatGPT and all large models.
At the end of 2024, Alec left his old employer OpenAI to conduct independent research. In March 2025, he joined the Thinking Machines Lab founded by former OpenAI CTO Mira Murati as a consultant.
Looking back at talkie itself, the whole thing is quite thought - provoking -
While the whole world is competing in AGI and reasoning models, the father of the GPT series himself went with his partners to create an AI that only exists in 1930.
According to the team's roadmap, a vintage model at the GPT - 3 level will be released this summer. In the future, they also want to expand the corpus to one trillion tokens and extend it to non - English - speaking worlds.
We just don't know if, when it wakes up again, seeing robots running marathons, smartphones in everyone's hands, and Agents everywhere -
Will it be stunned and paralyzed again.jpg.
(I've put the model usage entrance below. Friends who are interested can try chatting with the AI from a century ago.)