HomeArticle

Nature Gives a Thumbs Up: Latest Work from Harvard and MIT Usher in the Era of AI Scientists

新智元2025-10-21 10:16
ToolUniverse Unified Platform enables AI to operate over 600 scientific tools using natural language, driving scientific research automation.

The era of AI scientists is approaching. ToolUniverse, recently launched by Harvard and MIT, enables AI to operate over 600 scientific tools using natural language through a unified platform, promoting a comprehensive upgrade of scientific research automation and embracing a new paradigm for scientific discovery.

Every leap in the history of science is often accompanied by the innovation of tools. With the rapid development of large models and agents recently, this path is leading to a brand - new stage: the "AI scientist".

At the forefront of AI - empowered scientific research, we are witnessing an important milestone: shifting from proving "whether" an AI agent can solve specific scientific problems to thinking about how to make it participate in the entire research process "efficiently, reliably, and on a large scale".

A news analysis recently published in Nature reported the first large - scale open - source tool framework, ToolUniverse, jointly released by the teams of Marinka Zitnik and Gao Shanghua from Harvard University and MIT.

News link: https://www.nature.com/articles/d41586-025-03246-7

The online environment opened by ToolUniverse allows researchers to connect various large models and agents to commonly used tools in different scientific fields using natural language, laying the foundation for creating AI scientists.

Project homepage: https://aiscientist.tools

Paper details: https://arxiv.org/abs/2509.23426

Open - source code: https://github.com/mims - harvard/ToolUniverse

When AI generative models are no longer sufficient, why do we need AI scientists?

The core ability of traditional LLM (Large Language Model) is "text generation", but scientific research requires far more than that:

It needs to decompose complex problems (such as "how to optimize cholesterol - lowering drugs"), plan experimental steps, call professional tools (such as molecular simulation software), verify the rationality of data, and even self - correct when the results deviate from expectations. This closed - loop of "reasoning + action" is the key for AI to upgrade from a "model" to a "scientist".

The breakthrough of AI agents lies in the deep coupling of LLM with three major mechanisms:

  • Planning ability: Decompose "discovering new drugs" into executable steps such as "target identification → compound screening → property optimization → patent verification";
  • Memory system: Track intermediate results (such as "the permeability of a certain compound to liver tissue") to avoid repeated calculations or logical breaks;
  • Tool invocation: Connect to external databases, simulators, and analysis software to make up for the shortcomings of LLM in professional calculations (such as predicting molecular binding energy).

However, the particularity of scientific research poses higher requirements for AI agents: the tool formats of different disciplines (biology, chemistry, physics) are not unified, the data needs to be reproducible, and the experimental processes need to be rigorously verified.

If only relying on general tool invocation protocols (such as MCP, Model Context Protocol), it is impossible to solve professional problems such as "how to make AI understand the mass spectrometry data format" and "how to coordinate the outputs of molecular simulation and clinical databases". This is one of the core problems solved by ToolUniverse.

ToolUniverse, the ecological cornerstone of scientific AI agents

ToolUniverse is not a single tool but a standardized ecosystem for "connecting LLM with scientific tools" (Figure 1).

Its core goal is to enable any LLM to call over 600 scientific tools through a unified interface to complete the entire research process from "proposing hypotheses" to "verifying conclusions".

Figure 1: ToolUniverse is an ecosystem for creating AI scientists. General large language models (LLM), inference models, and agents can connect to over 600 scientific tools provided by ToolUniverse to automate the scientific research workflow.

The "HTTP" for unifying scientific tools, solving three major pain points

Just as the HTTP protocol unified Internet communication, ToolUniverse defines a dedicated "scientific tool interaction standard" for AI scientists (Figure 2). It can seamlessly integrate locally deployed open - source tools and safely and standardly connect to powerful closed - source models and API services, solving three major pain points of the MCP protocol in scientific research scenarios:

Figure 2: ToolUniverse connects machine - learning models, agents, scientific software tools, databases, and APIs through a unified protocol. It introduces a standardized tool specification framework that enables language models to consistently discover, call, and parse various tools. Similar to how HTTP established standards in Internet communication, the ToolUniverse protocol defines how AI scientists request tools and receive results through two core operations: Find Tool and Call Tool.

  • Difficulty in tool discovery: Through the "Tool Finder" component, AI can precisely match requirements from over 600 tools by combining keyword search, vector embedding retrieval, and LLM reasoning (for example, when "needing to predict the liver toxicity of a compound", it can automatically locate the ADMET - AI tool);
  • Non - standard invocation: The "Tool Caller" component first verifies the input (such as whether the molecular structure format conforms to the SMILES standard), then executes the tool, and finally converts the output into structured data (such as "binding energy - 8.2 kcal/mol" instead of messy text);
  • Difficulty in closing the reasoning loop: A new "reasoning control layer" is added to enable AI to understand the scientific meaning of tool outputs (such as "the high brain permeability of this compound → it may cause central side effects") rather than just making mechanical calls.

This standardized design enables AI to upgrade from "being able to use tools" to "being able to use scientific tools to solve problems".

Four core components supporting the complete lifecycle of AI scientists

ToolUniverse covers the entire process requirements of AI scientists from "tool acquisition" to "workflow optimization" through four components (Figure 3), truly realizing "programmable scientific collaboration".

Figure 3: ToolUniverse provides six key capabilities to support the complete lifecycle of AI scientists:

Tool Manager: The "registration and management center" for tools

It solves the problem of "how to integrate new tools into the ecosystem":

Local tools (such as self - developed data analysis scripts in the laboratory) only need to submit a "function description + parameter format + output example" to be automatically included in the unified schema;

Remote tools (such as cloud - based molecular simulation platforms) are connected through the MCP protocol without exposing internal code, taking into account both security and compatibility;

It automatically verifies the effectiveness of tools (such as "whether a prompt is returned when an incorrect molecular structure is input") to ensure the reliability of AI calls.

Tool Composer: The "builder" of scientific workflows

Scientific research rarely relies on a single tool. For example, "drug screening" requires the connection of "target database → compound library → molecular docking tool → toxicity prediction tool". The role of Tool Composer is as follows:

Define the data flow between tools (such as "the output structure of the molecular docking tool is directly used as the input of the toxicity prediction tool");

Support conditional logic (such as "if the predicted toxicity exceeds the standard, return to the previous step to re - screen compounds");

Generate reproducible workflow scripts for human scientists to trace or modify.

Automatically construct and optimize the invocation relationships between tools through the agent system.

Tool Discover: The "automatic generator" of tools

When existing tools cannot meet the requirements (such as "needing a new tool for visualizing gene expression data"), AI can describe the requirements in natural language, and Tool Discover will:

Convert the text description into structured tool specifications (such as "input: CSV - format expression matrix; output: heat map + volcano map");

Automatically generate code and test cases, and iteratively optimize through the feedback loop of "expected behavior vs. actual output";

Without manual coding, the tool library can be dynamically expanded according to scientific research needs.

Tool Optimizer: The "quality guardian" of tools

Scientific research emphasizes reproducibility. Tool Optimizer ensures tool stability through three actions:

Regularly generate test cases (such as "verifying the accuracy of the molecular docking tool with known active compounds");

Analyze the deviation between tool output and specifications (such as "the sudden increase in the error between the binding energy predicted by a tool and the experimental value");

Automatically update tool documentation or parameter settings to ensure consistency when AI calls.

Cross - model compatibility, enabling every type of LLM to become a scientific assistant

The requirements for LLM in different scientific research scenarios vary greatly: local laboratory analysis may require lightweight open - source models (such as Llama 3), while complex hypothesis reasoning may rely on cloud - based large models (such as Claude 3), and biomedical research also needs professional models (such as TxAgent).

The compatibility design of ToolUniverse breaks the limitation of "model binding" (Figure 4): It converts tool invocation into "standardized function calls" without modifying the weights or Tokenizer of LLM - just pass the "tool list + parameter format" to the model through a lightweight wrapper, and the model output can be parsed into tool invocation instructions.

The value of this design lies in:

Research teams can select models according to cost and privacy requirements without worrying about "rewriting tool invocation logic when changing models";

It can compare the performance of different models under the same experimental conditions (such as "which one has higher accuracy in drug screening, Gemini - CLI or Claude 3");

It supports the combination of professional models and general tools (such as "letting TxAgent call the ChEMBL database to analyze drug - target interactions").

Figure 4: ToolUniverse provides a simple and efficient protocol for building different types of AI scientists: It can be used for general large language models (such as Claude in the left - hand diagram), agent systems with stronger reasoning and control capabilities (such as Gemini - CLI in the right - hand diagram), and AI agents focusing on biomedical research (such as TxAgent).

Case study: How an AI scientist optimizes cholesterol - lowering drugs

The theoretical framework needs to be verified in practice. Taking "finding safer cholesterol - lowering drugs" as an example, let's see how an AI scientist built on ToolUniverse (based on the Gemini - CLI agent) completes the entire research process (Figure 5).

Figure 5: Shows an example of an AI scientist built on ToolUniverse and applied to drug discovery. This system...

Step 1: Target identification - Locking in the "key protein"

AI first calls the "literature mining tool" and the "drug - target database". By analyzing thousands of research papers and clinical data, it concludes that HMG - CoA reductase is the key enzyme for cholesterol synthesis, and excessive inhibition of this enzyme outside the liver can cause side effects such as muscle pain. This step completely replicates the "target discovery" logic of human scientists, but with an efficiency improvement of more than 10 times.

Step 2: Initial compound screening - Starting with existing drugs

AI queries the "database of marketed cholesterol - lowering drugs" through ToolUniverse, screens out drugs targeting HMG - CoA reductase, and finally selects "lovastatin" as the initial compound - the reason is that "it has been fully clinically verified, but has high extra - hepatic tissue permeability and a risk of side effects".

Step 3: Compound optimization - Improving safety and effectiveness

AI calls three tools to collaborate:

ChEMBL database: Obtain over 100 structural analogs of lovastatin;

Boltz - 2 tool: Predict the binding energy of each analog to HMG - CoA reductase (the lower the value, the stronger the binding);

ADMET - AI tool: Predict the liver permeability, brain permeability, and metabolic stability of the analogs.

Through comprehensive ranking, AI screens out two candidates:

Pravastatin: A well - known drug with low extra - hepatic permeability and fewer side effects (verifying the reliability of AI);

CHEMBL2347006/CHEMBL3970138: New compounds with a 30% higher binding energy than lovastatin, a 50% reduction in brain permeability, and a 25% increase in bioavailability.

Step 4: Patent and verification - Avoiding legal risks

Finally, AI calls the "patent search tool" and finds that the new compounds have been registered for the treatment of cardiovascular diseases. Although they cannot be directly developed, they provide directions for subsequent structural modification.

Throughout the process, AI not only completed the action of "calling tools" but also demonstrated scientific reasoning ability: It can explain "why a certain target