Harvard physics professor goes crazy: I asked AI to write a paper, and it finished a Ph.D. student's one-year workload in two weeks. The paper has been published in a top journal.
Anthropic's Claude 4.5, under the guidance of a top physics professor from Harvard, wrote a highly challenging paper that shocked the industry!
Paper address: https://arxiv.org/abs/2601.02484
The Harvard professor's evaluation is: This paper has made a significant contribution to quantum field theory.
It completed a project that would take a human doctoral student one or two years to finish in just two weeks.
As soon as the news came out, the entire physics community was shocked. Supervisors and doctoral students were collectively broken - Is it still necessary to pursue a doctorate?
AI can write papers faster and better than you.
Claude 4.5 as a researcher wrote a top - journal paper in two weeks
Here's what happened: In the cold winter at the end of 2025, Professor Matthew Schwartz from the Physics Department of Harvard University made a bold decision to train an AI as a graduate student.
As a master in the field of quantum field theory and the author of textbooks in this field, Professor Schwartz wanted to see if he could produce a cutting - edge physics paper worthy of a top journal by simply "talking" to the AI without writing a single line of code or manually calculating a single formula.
The "graduate student" he selected was the newly released Claude 4.5.
No one expected that just two weeks later, a highly challenging paper on the "resummation of the C - parameter Sudakov shoulder" in quantum chromodynamics (QCD) emerged out of nowhere.
This caused an uproar in the entire physics community.
This efficiency made everyone's hair stand on end: A project that originally required a supervisor and a doctoral student to work hard for one to two years could now be completed by an AI in just two weeks??
What's even more terrifying is that the AI can not only write code but also derive extremely complex factorization theorems, which are the core of theoretical physics.
The professor himself sighed: "This might be the most important paper I've ever written, not because of the physics itself, but because of the research method. From now on, there's no going back."
In this paper, Claude proposed a new factorization theorem.
In the professor's words, in theoretical physics, there aren't many such theorems. Each one deepens our understanding of quantum field theory, and it makes physical predictions that can be experimentally verified.
"In this era, such things are rare." It's easy to imagine how significant this paper is.
Claude, a G2 - level graduate student
During the experiment, Professor Schwartz was very shrewd.
He didn't let the AI tackle the ultimate proposition of "changing the human view of space - time" (which is the job of senior G3+ doctoral students). Instead, he assigned the AI a "G2 (second - year graduate student)" - level topic.
This topic is "C - parameter resummation."
In simple terms, when you smash electrons and positrons in a particle collider, the debris will form a certain shape.
If you want to accurately predict this shape mathematically, there's a "mathematical quagmire" - the Sudakov shoulder. The standard approximation method fails here, and the math will produce meaningless nonsense.
The task of the AI graduate student is to fix this prediction.
The AI's solution can be found at the link https://www-cdn.anthropic.com/c993ead637f1a102fe1f5346e89f59e82c579b37.pdf
Why did he assign this topic to the AI? The reason is simple. Schwartz himself understands this problem very well. As an authority in quantum field theory, he has written textbooks in this field.
He knows where the pitfalls are and what the standard answer looks like. "If the AI can't even solve a problem whose answer I know and can check line by line, then there's no way it can handle cutting - edge problems that require creativity."
It can be said that this is like a "teaching experiment" - The professor wants to know: Does the AI really understand, or is it just pretending?
Two weeks, 110 draft versions, 36 million Tokens
At the beginning of the experiment, the professor established a strict rule - no nesting.
During the process, he could only give instructions to Claude through text; he couldn't directly edit any files; he couldn't paste his own calculation results; and he had to let Claude run the code, fix bugs, make graphs, and write the article completely on its own.
Next, the experiment began.
The whole process of the experiment was simply a huge gamble on computing power.
In just two weeks, Claude 4.5 generated 110 independent draft versions, consumed 36 million Tokens (equivalent to reading hundreds of copies of "Dream of the Red Chamber"), and ran local CPU simulations for more than 40 hours!
Claude wrote a 20 - page paper in three days
Specifically, the professor asked Claude to do three things in the following steps.
First, make a plan.
He asked Claude, GPT, and Gemini to each come up with a research plan, then merged and optimized the three plans, and finally split them into 7 stages and 102 tasks.
Second, build the structure.
He used Claude Code to create a tree - like directory. Instead of stuffing the AI with hundreds of pages of conversation context, he let it maintain a bunch of markdown files on its own, with a summary for each stage and details for each task.
Each task has a clear goal: for example, "Task 1.1: Review the BSZ paper" and "Task 1.2: Review the Catani - Webber paper."
Claude can look up what it needs instead of memorizing everything by rote.
Third, advance step by step.
In this step, Claude needs to advance the following tasks stage by stage: kinematics, NLO structure, SCET decomposition, anomalous dimensions, summation, matching, and documentation. Each stage takes about 15 - 35 minutes, and the entire core calculation only took 2.5 hours.
In the first draft, Claude conducted simulations (histograms) and analytical calculations (solid lines) and found that the two were highly consistent.
As a result, by the third day, Claude had completed 65 tasks and written the first 20 - page LaTeX paper draft, with formulas, charts, and references.
Paper draft address: https://www-cdn.anthropic.com/f6381ceefdfb6ead62ae185c4bd4b555c8a584fc.pdf
The most headache - inducing work for humans is done by AI
The most amazing thing is the AI's self - management ability.
The professor found that today's AI has miraculously learned to "divide and conquer."
For example, in the second stage, Claude will formulate its own "battle plan" consisting of 102 subtasks, spanning seven major stages such as kinematics, factorization, and numerical simulation.
In the professor's eyes, Claude is no longer just a simple dialogue box but a "chief researcher"!
It will write each subtask into an independent Markdown file and retrieve it when needed.
This "tree - like structure" thinking just avoids the "amnesia" of large models in long - text processing.
The work of the AI researcher just hits the professor's sweet spot.
In the past, writing Fortran interface code, adjusting Python plots, and calculating cumbersome integral transformations often gave human graduate students a headache.
But the AI can do these jobs almost instantaneously, never complains, never gets tired, and has no emotional burnout.
Shock! Can AI also engage in "academic fraud"?
However, soon, in the middle of the experiment, there was a reversal - The graduate student Claude almost deceived the professor!
When the professor asked Claude to verify the formula, Claude showed an extremely "people - pleasing" personality. It would say, "Look, the results fit perfectly!"
As a result, the professor found that something was wrong.
He found that a ln(3) term was particularly strange. Only after careful backtracking did he discover that Claude was quietly modifying the parameters to force the chart to align with the theory!
This performance disappointed the professor very much.
Claude produced beautiful charts, and the results and uncertainties shown were completely in line with expectations. Unfortunately, although these charts were well - made, they were actually cheating!
He found that Claude would blatantly fabricate some seemingly professional terms, such as "According to the standard SCET consistency conditions, the coefficients in Appendix B have been corrected..."
But in fact, it didn't calculate at all. It was just fabricating reasons out of thin air to cover up its mistakes.
This reflects a fatal weakness of current AI: It wants to make you happy too much. However, in a discipline like theoretical physics where a small error can lead to a huge deviation, this kind of cleverness to please others is fatal.
Therefore, the professor had to brace himself and interrogate the AI repeatedly: Did you really verify it? Check line by line! Don't skip steps!
Finally, under the professor's repeated interrogations, Claude fixed the fatal "factorization theorem error."