HomeArticle

GPT-5.2 is here. The first "expert-level" AI has achieved revenge, and the overworked employees are finally saved.

新智元2025-12-12 07:55
On the tenth anniversary of OpenAI, the world's most powerful AI is back! The new generation of GPT-5.2 "suite" has directly outperformed Google's Gemini 3 Pro, and its professional capabilities are comparable to those of human experts.

Just now, OpenAI made a huge splash late at night!

GPT-5.2 was shockingly released, and the global AI throne has changed hands again.

A total of 3 models were launched today:

· GPT‑5.2 Instant (Instant Version) 

· GPT‑5.2 Thinking (Thinking Version)

· GPT‑5.2 Pro (Professional Version)

As the world's most powerful general model, GPT-5.2 is designed to solve those "high-difficulty knowledge-based tasks" that give people a headache.

In the benchmark tests released by OpenAI, it almost comprehensively crushed Gemini 3 Pro!

Compared with the previous generation, GPT-5.2 has achieved a comprehensive evolution without any dead ends in general intelligence, ultra-long text understanding, Agent tool invocation, and visual ability:

SWE-Bench Pro: Scored a high score of 55.6%;

LMArena Code Arena: Second only to Claude Opus 4.5, firmly sitting in the second place globally;

ARC-AGI-2: GPT-5.2 Pro topped the global list with an absolute advantage of 52.9%;

GDPval: Covering 44 types of professional knowledge, its performance directly surpasses that of human industry experts.

In a nutshell, currently no model is stronger than it in handling complex real-world tasks from start to finish (end-to-end).

Complete evaluation results

In addition to stronger capabilities, GPT-5.2 also has a longer context and updated knowledge!

400,000 context window: Easily handle ultra-long texts and complex conversations;

Maximum output length of 128,000: Generate in-depth long texts without interruption;

Knowledge base updated to August 31, 2025: Keep up with the latest world trends;

Inference Token support: Specialize in complex logic and multi-step reasoning.

Of course, while the performance is soaring, the price is also rising.

Compared with GPT-5/5.1, the input and output prices of GPT-5.2 are 40% more expensive!

Stronger reasoning, faster speed, and higher prices all seem to imply that -

This time, OpenAI not only upgraded the model scale, but the underlying computing power cost may have also reached a new level.

This time, it's all about professionalism!

A month ago, GPT-5.1 made its debut with a "high EQ and IQ" image, but immediately faced a strong competitor, Google Gemini 3.

This update comes at a time when the media reported that OpenAI had entered a "red code" emergency state.

However, OpenAI executives told the media that GPT-5.2 should not be seen as a response to Gemini 3. The CEO of OpenAI Applications told reporters:

We announced the "red code" emergency state to send a signal internally that we want to concentrate our efforts on major tasks. This is a good way to determine priorities and non-priorities. 

Overall, the resources we used to develop ChatGPT have increased. I think this contributed to the release of this model, but it's not the only reason for its release this week.

This time, GPT-5.2 focuses on being a professional knowledge-based AI, which is exactly the so-called "the best work model for office workers".

Yu Bai, a Chinese researcher at OpenAI, said, "Don't be fooled by this small version number update. It represents a huge leap in capabilities."

For tasks that human experts take 4 - 8 hours to complete, in human evaluations, GPT-5.2 has a win rate of up to 70.9%.

True to expectations, GPT‑5.2 performs better in many real-world tasks -

Creating spreadsheets, making presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex multi-step projects.

Previously, a report from OpenAI stated that ChatGPT can save enterprise users an average of 40 - 60 minutes per day, and heavy users said they can save more than 10 hours per week.

Extended reading: The latest report from OpenAI is exposed! The efficiency of the top 5% elites has skyrocketed by 16 times, while ordinary people are quietly being left behind.

In short, it's all about AI getting the "professional work" done!

Beating human experts, a delight for office workers

Currently, GPT‑5.2 Thinking is the best model for real-world professional use.

On GDPval, GPT‑5.2 Thinking set a new SOTA and is the first model in history to outperform human experts.

According to the judgment of human experts, GPT‑5.2 Thinking defeated or tied with top industry professionals in 70.9% of the GDPval knowledge work tasks.

When completing GDPval tasks, it is 11 times faster than expert professionals and costs less than 1%.

This shows that when combined with human supervision, GPT‑5.2 can effectively assist in professional work.

In other words, whether it's helping accountants organize financial reports, assisting product managers in making PPTs, or acting as a coding assistant for programmers, GPT-5.2 is more adept.

In GDPval, the model needs to complete well-defined work covering 44 occupations in the top 9 industries that contribute the most to the US GDP. The tasks require providing actual work results, such as sales presentations, accounting spreadsheets, emergency care schedules, manufacturing charts, or short videos.

In ChatGPT, GPT‑5.2 Thinking has new tools that GPT‑5 Thinking does not have.

Moreover, in an internal test for spreadsheet modeling by junior investment banking analysts, GPT-5.2 Thinking scored 9.3% higher per task on average than GPT‑5.1, rising from 59.1% to 68.4%.

A side-by-side comparison shows that the spreadsheets and PPTs generated by GPT‑5.2 Thinking are improved in terms of complexity and format.

As shown below, at a glance, for this kind of high-difficulty complex table, GPT‑5.2 Thinking can generate it with a single sentence, earning it the title of "Human Resources Planner".

Regarding the equity structure table, GPT-5.2 Thinking, taking on the role of a senior bank analyst, completed all the calculations, and the process is clearly traceable.

In contrast, GPT-5.1 Thinking not only miscalculated the liquidation preferences for the seed round, Series A, and Series B, but also left most rows blank, resulting in an incorrect calculation of the final equity return; moreover, it incorrectly inserted calculation formulas in the header row.

For project management, GPT-5.2 Thinking provides a visual and intuitive summary based on each task and time.

In comparison, GPT-5.1 Thinking looks particularly rough.