HomeArticle

Actual test of GPT-5.2: The price has skyrocketed while the capabilities have only slightly improved. How can it counterattack Gemini?

爱范儿2025-12-12 18:00
The rankings have been manipulated, but it seems that using it is not as smooth as using Gemini.

GPT 5.2, which is said to outperform Gemini, was officially launched early this morning and is available to all users.

I just canceled my ChatGPT Plus subscription last month and switched to Gemini. Do I need to switch back because of GPT-5.2?

After reading the following real user experiences shared by netizens and APPSO's hands-on test, you may find an answer.

Finally, it didn't draw the chart wrong this time.

Actually, GPT 5.2 has updated three models: GPT-5.2 Instant, Thinking, and Pro. If you're used to the way Gemini 3.0 Pro takes time to think in each Q&A, you'll notice that ChatGPT with GPT-5.2 Thinking/Pro is slower and takes longer than before.

This is also the experience shared by most users who got early access on social media. That is, GPT-5.2 has improved in every aspect compared to 5.1, and GPT-5.2 Pro is very suitable for professional reasoning work and tasks that take a long time to complete. However, the waiting process for results has become even longer.

For example, a user shared that when they entered the prompt "Help me draw a chart of HLE test scores", GPT-5.2 Pro took 24 minutes to generate the chart.

Image source: https://x.com/emollick/status/1999185755617300796/photo/1

Thankfully, all the information is accurate, even though the best result on the chart still shows Gemini 3.0 Pro.

This is also thanks to the knowledge cutoff date of GPT-5.2 being August 2025. You know, the knowledge cutoff date of GPT-5.1 was September 2024, and that of Gemini 3.0, which was just released last month, was January 2025.

When we used GPT-5.2 Thinking to generate a chart of OpenAI's model release history, it didn't take too long, and the information was relatively accurate. For simple tasks, there will be a significant difference in the time taken between using the Thinking model and the Pro model.

Prompt: generate a chart graph of OpenAI model release over time

With its "ultra-strong" reasoning ability, the latest world knowledge, and the multi-modal understanding and reasoning ability combined with images, GPT 5.2 quickly soared to the second place in the large model arena. GPT-5.2-High ranks second in the WebDev (web development) project, and GPT-5.2 ranks sixth. In comparison, Gemini 3.0 Pro ranks third, and Claude still holds the first place.

The official LMArena also released a hands-on test video. They used GPT-5.2 to complete a series of 3D modeling tasks with a high degree of completion. However, some netizens commented below, "Is it still 2003 now?"

Video source: https://x.com/arena/status/1999189215603753445

This 3D effect achieved with three.js highly requires the model's multi-modal understanding and reasoning ability, as well as optimization in programming and software design. GPT-5.2 lives up to this 0.1 upgrade.

Currently, most of the tests shared by netizens focus on building these complete 3D engines, and GPT-5.2 has performed quite well. For example, someone used the high-difficulty reasoning mode of GPT-5.2 Thinking to build a 3D ice kingdom model in a snowy day in a single-page file, which supports interactive control and can export 4K resolution.

https://x.com/skirano/status/1999182295685644366

There is also a 3D Gothic city building with rough waves created using GPT-5.2 Pro.

Prompt: create a visually interesting shader that can run in twigl-dot-app make it like an infinite city of neo-gothic towers partially drowned in a stormy ocean with large waves. | Source: https://x.com/emollick/status/1999185085719887978?s=20

Regarding the 3D understanding and reasoning ability, we also used the same prompt that Ian Goodfellow used after the release of Gemini 3.0 Pro. That is, we uploaded an image and asked the model to generate a beautiful voxel art Three.js single-page program scene based on this image.

Since ChatGPT didn't generate it within the canvas, I copied the code it generated in the dialog box and opened it in the HTML View, as shown in the right figure.

The difference is quite obvious. Although ChatGPT also recognized the content of the uploaded image, a pink tree, a green field, a gray depression, and white water flow, the 3D animation it generated is a bit crude compared to that of Gemini 3.0 Pro.

All I can say is that Altman's "red alert" shows the real strength of Gemini.

The classic test of programming ability must include the physical movement of hexagonal balls. A blogger increased the difficulty of the ball movement by using all shiny red 3D balls. The effect looks very cool, and many netizens are asking how it was done. However, some netizens also pointed out that these balls don't seem to be affected by gravity.

Then a netizen replied that it was simulating space.

Video source: https://x.com/flavioAd/status/1999183432203567339

There is also an SVG code test, a pelican riding a bicycle.

Image source: https://arena.jit.dev/

Some netizens also shared that they used GPT-5.2 to create a forest fire simulator, which can adjust the speed, area size, and fire burning range.

Image source: https://x.com/1littlecoder/status/1999191170581434557?s=20

We created a webpage for planet signals. The layout of this webpage is very similar to that of the forest fire visualization webpage, except that the dotted content on the left is replaced by space planets.

Prompt: Create an interactive HTML, CSS, and JavaScript simulation of a satellite system that transmits signals to ground receivers. The simulation should show a satellite orbiting the Earth and periodically sending signals that are received by multiple

We also used the same prompt we used for Gemini 3 to test GPT-5.2. We asked it to develop a webpage camera application in a retro Polaroid style.

Prompt: Develop a single-page camera application with a retro and realistic style. Design the page background as a corkboard or dark wood grain material. Fix a realistic Polaroid camera model drawn with pure CSS or SVG in the lower left corner, and let its lens area display the user's camera view in real-time. In terms of interaction logic, when the user clicks the shutter button, play the shutter sound effect, and let a photo paper with a white border slowly eject from the top of the camera. Use CSS filters to make the initial state of the ejected photo highly blurred and in black and white, and smoothly transition to a clear and full-color state within 5 seconds. Finally, all developed photos must support free dragging, allowing the user to place them anywhere on the page at will. The photos should have random small rotation angles and shadows. When clicking on a photo, it should be brought to the top, thus forming a realistic free photo collage wall.

Surprisingly, ChatGPT managed to create a Polaroid camera application in one go.

When we tested Gemini 3.0 Pro before, its most powerful capabilities were programming on the one hand, and on the other hand, it didn't require us to input too many prompts. We just needed to give it a screenshot or a video and tell it to replicate, and Gemini could do it.

This time, we also gave it a video and asked it to replicate this webpage for generating ancient poems.

https://chatgpt.com/canvas/shared/693b6d1b8fa881919c6298a4aed05581

Compared to GPT-5.1, which had no idea about the color scheme of the uploaded video before, GPT-5.2 has learned this time. However, since the webpage generated by Gemini can directly add AI functions through the use of Gemini's API, but ChatGPT hasn't introduced AI into these generated webpages yet, so the poems here can only be those that have already been written.

In addition to the classic programming ability tests and simply creating a single-page HTML file, some netizens also used it to write Python code.

The prompt input by the netizen was "write a python code that visualizes how a traffic light works in a one way street with cars entering at random rate."

He also tested GPT 5.2 Extended Thinking and Claude Opus 4.5 at the same time, and the result is obvious. Many readers often ask us which is the best programming model. There's a reason why Claude is favored by so many developers.

The following image is of GPT-5.2. Source: https://x.com/diegocabezas01/status/1999228052379754508

Moreover, the biggest drawback of the Claude model before might be its high cost. The input of Claude Opus 4.5 costs $5 per million tokens, and the output costs $25. Now, the price of GPT-5.2 has also increased. Compared to GPT-5.1, it's generally about 40% more expensive. The input of GPT-5.2 Pro costs $21, and the output costs $168.

In the official release blog, OpenAI mentioned that GPT-5.2 has also improved in terms of image processing ability.

GPT-5.2 Thinking is our most powerful visual model to date, with the error rate in chart reasoning and software interface understanding reduced by about half.

It also gave an example. For a motherboard that looks very blurry, it used AI to add some marked boxes. Compared to GPT-5.1, although GPT-5.2 also makes mistakes, it marks more areas.

But what about Nano Banana Pro? Some netizens used Nano Banana Pro to remove the annotation information on the image and then asked it to add new target positioning boxes. Which one do you think is better?