HomeArticle

AGI has a new "authoritative" definition, proposed by Turing Award winner Yoshua Bengio and others. GPT-5 only reaches 57%.

账号已注销2025-10-29 18:07
Evaluate AGI from 10 major dimensions.

Artificial General Intelligence (AGI) may become the most important technological breakthrough in human history. However, due to the lack of a clear definition of AGI, the gap between today's specialized Artificial Intelligence (AI) and human cognitive levels has become blurred.

To address this issue, Dan Hendrycks, the director of the Center for AI Safety (CAIS), and Yoshua Bengio, a Turing Award laureate, along with many industry entrepreneurs and scholars, proposed a quantifiable framework and defined AGI as:

An AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult.”

an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult.

This definition emphasizes that general intelligence not only requires professional performance in narrow domains but also demands the breadth of skills (versatility) and depth (proficiency) characteristic of human cognition.

Paper link: https://arxiv.org/abs/2510.18212

The research results show that within this framework, GPT-4's AGI score is only 27%, and GPT-5's score is only 57%.

Figure | AGI scores of GPT-4 and GPT-5.

This indicates that although current AI performs well on complex benchmarks, it lacks many core cognitive abilities crucial for human-like general intelligence.

More importantly, this framework provides a structured, quantifiable, and more robust method to evaluate AGI, going beyond narrow and specialized benchmark tests.

10 Core Capabilities of AGI

To systematically test the specific cognitive abilities of AI systems, the research team built a methodology based on the Cattell-Horn-Carroll theory (the most empirically verified model of human intelligence). This framework decomposes general intelligence into 10 core cognitive domains, including reasoning, memory, and perception, and uses a mature human psychometric test suite to evaluate AI systems.

Figure | 10 core components under the proposed AGI definition.

The specific content is as follows:

1. General Knowledge

General Knowledge refers to "knowledge that most well-educated people are familiar with or that is important enough for most adults to have encountered." In this dimension, the research team evaluated GPT-5 and GPT-4 in terms of common sense, science, social science, history, culture, etc. The results show that GPT-5's overall correct rate is only 9%.

2. Reading and Writing Ability

Reading and Writing Ability means "mastering all declarative knowledge and procedural skills in reading and writing." In this dimension, the research team evaluated GPT-5 and GPT-4 in terms of common word recognition, reading comprehension, writing ability, grammar, etc. The results show that GPT-5's overall correct rate is only 10%.

3. Mathematical Ability

Mathematical Ability refers to "the depth and breadth of mathematical knowledge and skills." In this dimension, the research team evaluated GPT-5 and GPT-4 in terms of arithmetic, algebra, geometry, probability, calculus, etc. The results show that GPT-5's overall correct rate is only 10%.

4. On-the-Spot Reasoning

On-the-Spot Reasoning means "prudently and flexibly controlling attention to solve new and immediate problems that cannot be completed solely relying on previously learned habits, schemas, and scripts." In this dimension, the research team evaluated GPT-5 and GPT-4 in terms of deduction, induction, theory of mind, planning, adaptation, etc. The results show that GPT-5's overall correct rate is only 7%.

5. Working Memory

Working Memory refers to "the ability to save, process, and update information while maintaining focused attention." In this dimension, the research team evaluated GPT-5 and GPT-4 in terms of auditory, visual, and cross-modal models. The results show that GPT-5's overall correct rate is only 4%.