HomeArticle

The father of GPT throws AI back to 1930: He's never seen a single line of code, yet he "invented" Python.

新智元2026-04-29 12:14
I've never seen such a thing in my life.

Can you believe it? An AI that "lived" 95 years ago actually wrote Python code. The father of GPT stepped in and trained an "antique" AI with 260 billion tokens.

An AI that has never seen a computer actually wrote modern programming code!

This is not a science - fiction setting.

Just today, Alec Radford, the father of GPT, led the team to release the shocking "talkie" -

It has a total of 13 billion parameters and is a large - model that has only read old literature before 1931.

The "worldview" (all training data) of talkie is frozen on December 31, 1930.

In that era, there was no Internet, no Wikipedia, and no modern code.

The "newest" things it has read are patent books, scientific journals, etiquette manuals, and private letters from nearly a century ago.

But this AI that "lived 95 years ago" can actually write Python code.

Without any programming knowledge

It wrote Python code and understood the concept of "inverse function"

The most astonishing discovery about talkie is hidden in a set of programming tests.

The Alec Radford team had a creative idea and used HumanEval to test talkie's programming ability -

They provided it with several Python functions as context examples and then asked it to solve new programming problems.

It should be noted that there is not a single line of modern code in talkie's training data. Even the concept of digital computers does not exist in its "knowledge system".

But the result was astonishing. Through few - shot learning, it could actually write correct Python programs.

Currently, it can only complete simple single - line programs, such as adding two numbers or making minor modifications to the context examples.

Alec Radford: The core figure behind GPT, CLIP, and Whisper

One case was particularly impressive: Given an encoding function encode_shift for a rotation cipher, which moves each letter 5 positions backward in the alphabet.

Talkie wrote the corresponding decoding function by itself. The only change was one character: changing +5 to -5, replacing the plus sign with a minus sign.

It truly understood the concept of "inverse function": "Encryption is addition, so decryption is subtraction".

Link: https://talkie-lm.com/chat

260 billion tokens, all from century - old paper

Why did the Alec Radford team go to so much trouble to manually OCR nearly a century - old physical literature to train an "antique" AI?

Because they want to answer one of the most core questions in the AI field: Are the capabilities of large language models (LLMs) based on reasoning or memorization?

Talkie's ability to write Python code proves that -

LLMs can reason using 19th - century knowledge, not just retrieve information. It has to be said that this is "generalization" in the true sense!

Looking at talkie's training corpus, it can be regarded as a huge "archaeological project".

Its training corpus reaches 260 billion tokens, all from English texts before 1931, including books, newspapers, journals, scientific papers, US patents, and case law.

It should be noted that all these texts need to be scanned from physical documents and transcribed by OCR.

The reason for choosing 1930 as the cut - off date is quite practical: it is the dividing line of the US public domain copyright law.

However, this has brought an unexpected bottleneck: data quality.

The team conducted a control experiment: When training the model with old texts transcribed by a traditional OCR system and comparing it with training the model with the same batch of texts transcribed manually, the learning efficiency of the former is only 30% of the latter.

Simple regular cleaning can increase this figure to 70%, but there is still a huge gap.

In the experiment to evaluate talkie's performance, the team also created a "modern twin" (talkie - web - 13b - base).

The latter was trained with modern web data from FineWeb, and both models used the "same computing power".

Obviously, in core language understanding and mathematical reasoning tasks, talkie performs as well as its modern twin.

But in general knowledge evaluation, even after removing the "time - traveling" questions from the 1930 perspective, talkie still lags behind.

The team suspects that this has a lot to do with data quality.

For this reason, the Radford team plans to train a "retro OCR system" from scratch, specifically for re - transcribing texts before 1931.

Using the most modern Claude 4.6

To train the oldest AI

Talkie's "post - training" plan is also very interesting.

To turn a "base model" that has only read old books into a chatbot capable of dialogue, there is no ready - made instruction - fine - tuning data available.

The team's approach is to extract instruction - answer pairs from structured reference books before 1930: etiquette manuals, letter - writing guides, cookbooks, encyclopedias, and poetry collections.

Then, they use these "retro textbooks" for the first round of Supervised Fine - Tuning (SFT).

In the subsequent Reinforcement Learning from AI Feedback (RLAIF) stage, the team uses online Direct Preference Optimization (DPO) to improve talkie's instruction - following ability, with Claude Sonnet 4.6 as the judge.

The most advanced AI in 2026 is grading an AI that "lives" in 1930.

In the final fine - tuning stage, the team even uses Claude Opus 4.6 to generate multi - round dialogue data to polish talkie's dialogue ability.

During the training process, Claude's score for talkie's instruction - following ability increased from 2.0 to 3.4 (out of 5).

In the last step, they use Claude Opus 4.6 to conduct multi - round synthetic dialogues with talkie, and then perform another round of rejection sampling + SFT to polish the dialogue ability.

The team also admits an ironic point: Using a modern large - model to train a model that is supposed to be frozen in 1930 is itself a form of "time pollution".

Their long - term goal is to use the retro base model itself as a judge to achieve a completely "bootstrapped" post - training pipeline.

It is worth mentioning that the 7B version of talkie had a funny side - effect after RL training -

It started to speak in a list format, purely infected by the "bad habits" of modern AIs.

The cleanest "open - book exam" in the AI field

The research team also conducted another interesting experiment.

They extracted nearly 5,000 historical event descriptions from the "This Day in History" section of The New York Times and calculated talkie's "surprise level" for each event.

The result was very clear. Talkie was not very surprised by events before 1930. The surprise level started to rise for events after 1930.

It reached its peak in the 1950s and 1960s and then leveled off.

This curve itself is an experiment on predictive ability. How will this curve change as the model size increases?

Demis Hassabis, the CEO of Google DeepMind, once proposed a thought experiment -

Can a model trained only up to 1911 independently discover the general theory of relativity like Einstein did in 1915?

Talkie can't do it for now. But it provides a path; we just need to scale it up.

Expand to GPT - 3 level this summer

Talkie currently has 13 billion parameters, and the team's roadmap is quite aggressive -

This summer, they will release a retro model at the GPT - 3 level.

The longer - term goal is to expand the corpus to over one trillion tokens, which is theoretically enough to train a GPT - 3.5 - level model with