StartseiteArtikel

AI can achieve in half a day what a doctoral student needs six months to do. Ultraman is extremely excited, causing a huge shock in the biochemistry circle.

新智元2025-11-22 16:00
The AI scientist, Altman, really didn't brag!

[Introduction] Altman from OpenAI excitedly retweeted! This will be the most profound impact of AI, indicating that the era of science transitioning from scarcity to abundance is approaching.

The AI achievement that excited Altman from OpenAI!

The "AI scientist" Kosmos has brought 7 discoveries:

It independently replicated 3 major discoveries in fields such as neuroscience, materials science, and biology.

It also made 4 original discoveries in genetic epidemiology, multi - omics integration analysis, Alzheimer's disease, and transcriptomics.

Altman from OpenAI excitedly said, "I expect people to see more and more similar things, and this will become one of the most important impacts of AI. Congratulations to the Future House team!"

It turns out that this breakthrough is backed by OpenAI -

Sam Rodriques, the director and CEO of Future House, commented on a tweet:

All of this is possible largely thanks to the excellent work done by the OpenAI employees.

With perseverance, the next few years are bound to be exciting.

He also promoted the Kosmos experience platform.

What's the background of the "Future House" that Altman congratulated? What exactly did it do to excite Altman?

It's guaranteed that Altman wasn't "stunned and slumped" this time. But without a doubt, this made him truly see the dawn of ASI, where "AI is accelerating science"!

The world's first AI scientist team

Born to accelerate science with AI

Cutting - edge science is shifting from "scarcity" to "abundance". Human knowledge is surging exponentially, but the capacity of the human brain remains stagnant.

As a result, new discoveries are missed, and potential connections go unnoticed.

To drive scientific progress, humanity urgently needs an intelligent agent that can keep pace with the amount of data and reason across the entire human knowledge record.

In 2023, the non - profit organization FutureHouse was founded with the goal of creating an AI scientist to accelerate innovation.

The mission of "Future House" is simple: to equip every researcher with an AI scientist to accelerate cross - disciplinary discoveries.

"Future House" can be regarded as the world's first AI scientist team. It can continuously search for information like a 007 agent and verify whether doctoral - level ideas in scientific fields such as biology, chemistry, environment, and materials are feasible. In 2.5 months, this platform found a new drug for treating blindness, shocking the medical community.

At the beginning of this month, Edison, the commercial branch of FutureHouse, is now taking this technology global.

FutureHouse continues to focus on promoting basic biological research and education popularization.

While Edison extends the AI scientist technology to researchers worldwide and various industries.

Edison is jointly developed by scientists and engineers from top institutions in fields such as physics, biology, chemistry, and artificial intelligence.

Edison will continue to uphold the concept of FutureHouse, providing rich free services to the scientific research community and offering more paid options for in - depth users who need higher request rates or additional features.

Structured world model

Read 1500 papers at once

Kosmos is a significant upgrade after FutureHouse's previous - generation AI scientist, Robin.

First of all, Kosmos is completely different from many AI tools. It's not a chatbot but more like a "deep scientific research tool": it requires a certain amount of learning and debugging time, especially in the design of prompts.

The Edison team emphasized that Kosmos is not a "recreational" chatting tool but a scientific research tool similar to a 'kit', suitable for truly high - value research tasks.

Therefore, the pricing of Kosmos is quite high, but academic users can enjoy a free quota.

As the next - generation AI scientist, the core breakthrough of Kosmos lies in the introduction of a structured world model.

It can efficiently integrate information extracted from hundreds of agent trajectories and maintain the consistency and coherence of research goals in texts with tens of millions of tokens.

Previously, AI scientists like Robin had difficulty processing and integrating large - scale information. Limited by the context length of large - language models, AI scientists couldn't "go far" in the reasoning path and had difficulty completing complex discoveries.

In a single complete run, Kosmos can read 1500 papers and execute 42,000 lines of analysis code, far exceeding the capabilities of any known intelligent agent.

That's why Kosmos has more powerful analysis capabilities compared to the previous - generation Robin.

According to feedback from beta - test users, Kosmos can complete in one day the scientific research work that originally took six months, and the accuracy of the conclusions is as high as 79.4%.

This equivalent duration of "six months" was initially astonishing!

Although Kosmos can usually generate scientific research results equivalent to months of human labor, it sometimes goes astray - for example, delving into directions that are statistically significant but of little scientific significance. Therefore, running Kosmos multiple times for the same research goal can explore different paths it might take.

A research AI intern earlier than OpenAI?

During the development of Kosmos, the most surprising thing is: a single complete run of Kosmos is equivalent to about six months of work by a doctoral or post - doctoral researcher.

More interestingly, the development team found that this "human - equivalent time" increases linearly with the depth of the run.

This has also become the first scaling law of reasoning time related to the "complexity of scientific research tasks".

Initially, the development team was also skeptical of this result, so they conducted a special verification -

They invited beta - test users to provide research goals and ran Kosmos on their behalf. Then they sent the results back to the test users and asked them to estimate how long it would take them to make this discovery without Kosmos?

According to the feedback from 7 scientists, a 20 - step - deep run of Kosmos is on average equivalent to 6.14 months of research work.

They also conducted the same evaluation for shallow runs and used blind - testing methods for control, finally obtaining the scaling law curve shown in the technical report.

Although the estimation of "human - time saved" is subjective, the development team still believes that the work package completed by Kosmos is indeed equivalent to several months of research time for a scientist, mainly for two reasons:

One is the objective control verification of "independent replication".

In the technical report, they showed that three discoveries made by Kosmos had actually been independently completed by human scientists before. But when running Kosmos:

Two of them were still unpublished.

The other one was published, but the publication time was later than the cut - off date of Kosmos' training data.

They also ensured that Kosmos couldn't access these literatures or any research citing them.

Even so, Kosmos still successfully replicated these core discoveries in a single run. According to the records of the original authors of these studies, it usually takes about several months for humans to make these discoveries.

Of course, this time also has uncertainties (such as whether the researchers are 100% dedicated to the project). But compared with the "user feedback method" based on subjective questionnaires, this "comparison method of existing results" is obviously more objective, further supporting that the time value of Kosmos' work results indeed reaches the "several - months level".

The second is the independent estimation model of "computational man - hours".

They also built a more quantitative evaluation model: assuming that a scientist takes an average of 15 minutes to read a paper and about 2 hours to execute a complete data analysis path (this assumption is consistent with METR's estimation of the time required for current AI agents in software engineering tasks).

According to this statistics, the total number of papers read and analysis paths executed by Kosmos in an average run is equivalent to about 4.1 months of human scientific research time (calculated based on a 40 - hour workweek).

During OpenAI's live - broadcast announcement of the "hundred - billion - dollar share reform", Altman directly stated OpenAI's "scientist vision":

By September 2026, create a research assistant AI at the intern level.

By 2028, achieve a fully automated "real AI scientist".

If Kosmos' automatic research has now reached the "several - months level", is it OpenAI's "intern - level research assistant AI"?

If Kosmos has already achieved this, is there any difficulty in OpenAI's goal set for 2026?

No wonder Altman tweeted his excitement.

Moreover, in the technical report, Kosmos has already discovered new results in disciplines such as biology, chemistry, and materials science.

All conclusions in the Kosmos report have clear sources - either citing the original literature or indicating the location of the code that generated the conclusion, ensuring that the entire reasoning chain is fully traceable.

Verified by independent scientists, 79.4% of the statements in the Kosmos report are accurate.

7 new discoveries

Save doctoral students in biology, chemistry, environment, and materials science!

The technical report details seven scientific research discoveries made by Kosmos.

Three of them are independent replications of previous results by human scientists.

The first discovery: Kosmos used metabolomics data to replicate the core conclusion of an unpublished manuscript - under low - temperature conditions, nucleotide metabolism is the most significantly changed pathway in the mouse brain.

The key point is that the pre - print of this study was published on BioRxiv after Kosmos completed its run. That is to say, the AI and humans independently discovered the same result almost simultaneously.

The second discovery: Kosmos successfully replicated the key point of a pre - print, and the publication time of this pre - print was later than the cut - off date of the large - language model (LLM) it used, and Kosmos didn't access this literature during the run.

This discovery is from the field of materials science, indicating that Kosmos has cross - disciplinary research capabilities.

Specifically, Kosmos reproduced the following conclusion: during the thermal annealing process, absolute humidity is the dominant factor determining the efficiency of perovskite solar cells, and the critical threshold is about 60g/m³ - once the humidity exceeds this value, the device will completely fail.