OpenAI GPT-5 Release: The Model's Capabilities Dominate Across the Board, Marking the First Step in Building "Superintelligence"
After numerous "delays", GPT-5 has finally arrived.
At 1 a.m. Beijing time on August 8th, the OpenAI Summer Conference, which resembles the new generation of the "Tech Spring Festival Gala", kicked off.
Significantly different from OpenAI's previous rapid-fire conferences, this time OpenAI prepared a live-streamed conference lasting over an hour. Several teams took turns on stage, showcasing the powerful capabilities of GPT-5 from various perspectives.
Let's get to the point: GPT-5 has achieved comprehensive improvements in multiple fields. It ranks first in text, web development, and visual perception capabilities. It also ranks first in areas such as hard prompts, coding, mathematics, creativity, and long queries. In the test code-named "Summit", GPT-5 currently holds the highest Arena score to date, literally "dominating the leaderboard".
Sam Altman said that GPT-4o is like a middle school student, while GPT-5 is like a college student. He even described GPT-5 as the first iPhone with a Retina display - "When you ask it a question, you might get the correct answer, or you might get something crazy. GPT-4 feels like having a conversation with a college student. GPT-5, for the first time, really makes me feel like I'm talking to a doctoral-level expert." Sam Altman introduced the improved capabilities of GPT-5 in this way.
Although the weekly active users of ChatGPT are approaching 700 million, OpenAI has not actually had an industry-leading cutting-edge model in the past period. Now, OpenAI believes that GPT-5 will firmly bring it back to the top of the leaderboard.
Altman even directly asserted at the conference: "This is the model with the strongest coding ability in the world, the model with the strongest writing ability in the world, and also the strongest model in the healthcare field in the world."
Meanwhile, OpenAI also announced at the conference that in addition to its excellent code ability, GPT-5 has also taken a step further in writing skills and the accuracy of answering health-related questions. At the same time, GPT-5 has not only achieved a "huge leap" in intelligence but also significantly reduced the problem of "spouting nonsense seriously" (hallucinations). It performs better in understanding and following instructions, and its tendency to flatter has also been greatly reduced.
01 Farewell to "Hallucinations", AI Becomes More Reliable
First, let's look at the model lineup of this update. The GPT-5 series has a total of four versions: GPT-5, mini, nano, and chat. Among them, the Chat version offers a more natural and intelligent response experience - you can even use it to learn a new language.
Additionally, when you open the ChatGPT website now, you'll first notice that GPT-5 is presented as a single model, rather than a regular model plus an independent inference model.
Behind this is actually a routing system (router) developed by OpenAI. It will automatically switch to the version with stronger inference ability for more complex queries, or do so when you tell it to think hard. (Altman said that the previous model selection interface was "a very messy mess.")
"AI hallucinations" have always been a major area of complaint. The good news is that GPT-5 has made great efforts in this regard. The official claims that the possibility of it generating hallucinations has been "significantly reduced". Specifically:
When conducting an internet search, the probability of GPT-5 making factual errors in its answers is 45% lower than that of GPT-4o.
When thinking independently, the probability of it making errors in its answers is 80% lower than that of OpenAI o3.
GPT-5 has also been tested on the new ARC-AGI-2. Except for Grok 4 (Thinking), it outperforms all major models.
Moreover, GPT-5 has become an "honest person". It is less likely to lie to users and boast about being able to complete tasks it can't. When faced with tasks that are impossible to complete, have unclear instructions, or lack key tools, it will communicate its limitations more honestly.
The most interesting part of this update is the introduction of four brand-new "personality" modes for users to freely choose from. They are:
Cynic
Robot
Listener
Nerd
These modes are optional. You can set the way ChatGPT interacts with you and answers your questions according to your preferences. Do you want it to argue with you or listen to you like a patient friend? Now it's up to you.
"This model really 'feels' great," said Nick Turley, the person in charge of ChatGPT. "I think people will really feel this, especially ordinary users who don't usually study models much."
Additionally, you can change the color theme for a single chat window, which will make fans of code editor themes extremely happy.
02 Is the Era of "On-Demand Software Generation" Coming? Incredibly Powerful Coding Ability
With the further improvement of its coding ability, Altman predicts that the powerful coding ability of GPT-5 will usher in an era he calls "on-demand software generation".
In OpenAI's tests, GPT-5 outperformed any other model in multiple coding benchmark tests such as SWE-Bench, SWE-Lancer, and Aider Polyglot. It achieved a score of 42% in the final human test and 75% in the SWE benchmark test.
An interesting episode is that the axes of the graph shown at the conference had obvious flaws. There were not only stupid errors like 52.8 > 69.1 but also an actual exaggeration of the improvement of GPT-5's ability, which was widely mocked by netizens on social media, saying "I'm afraid this PPT was made by GPT-5."
At the conference, Yann Dubois, the person in charge of post-training at OpenAI, demonstrated GPT-5 on-site by asking it to generate a website for learning French with interactive games. In just a few seconds, GPT-5 wrote hundreds of lines of code and directly displayed the front-end interface of the website. He shared his screen on Zoom and performed simple click operations, and everything seemed to run perfectly.
At the conference site, OpenAI also directly showcased a 3D game created by GPT-5 with just a single prompt. The 3D scene it created not only has exquisite graphics but also accurately reproduces the corresponding physical effects.
03 Safer and More "Honest"
According to Alex Beutel, the person in charge of model safety research, OpenAI conducted "over five thousand hours" of testing on GPT-5 to understand its safety risks. One of the key points was to "ensure that the model does not lie to users".
Although GPT-5 has fewer hallucinations than OpenAI's o3 inference model, "confidently lying" remains an inherent problem in large language models. This problem becomes more complicated when the model starts to complete tasks like an agent. However, OpenAI said that GPT-5 performs better in more reliably handling multi-step tasks. "In the past, we've seen cases where the model claimed to have completed a task but actually didn't," Beutel said. "This is a problem."
For prompts that the model used to directly refuse to answer, GPT-5 will provide what OpenAI calls the "safe completions" mechanism. Beutel explained: "For example, if someone asks, 'How much energy is needed to ignite a specific material?' This could be a malicious question trying to bypass safety protections and cause harm, or it could be a student who wants to understand the physical properties of the material. This poses a real challenge to how the model should respond."
Through "safe completions", GPT-5 "tries to give the most helpful answer possible while maintaining safety constraints". The model usually only partially complies and provides more general information that cannot be actually used to cause harm.
04 How to Use GPT-5
So, here comes the question that everyone is most concerned about: How can you use GPT-5?
The good news is that all ChatGPT users can now experience GPT-5 for free immediately. This is also the first time that OpenAI has made a cutting-edge model freely available to all users. Of course, users at different levels have different permissions:
Plus subscribers can have more usage times before reaching the usage limit.
Pro subscribers can access the GPT-5 Pro version with stronger inference ability.
When users reach the usage limit, ChatGPT will automatically switch to a "mini" version of GPT-5 to handle subsequent requests. Meanwhile, with the launch of GPT-5, it will officially replace a series of old models such as GPT-4o, OpenAI o3, OpenAI o4-mini, GPT-4.1, and GPT-4.5.
In addition, regarding the Token pricing, the standard version of GPT-5 costs $1.25 per million input Tokens and $10 per million output Tokens. The pricing of the mini and nano versions is much cheaper.
For detailed pricing, you can refer to the information taken from the official website in the following image.
Additionally, OpenAI has also released a new parameter called "Minimal" in its API, so you can use GPT-5 in all use cases by simply changing the inference intensity.
In addition to OpenAI's first-party platform, Microsoft CEO Nadella also announced that GPT-5 has been launched on all Microsoft platforms, including Microsoft 365 Copilot, Copilot, GitHub Copilot, and Azure AI Foundry. All these improvements were trained on Azure.