GPT-5's Setback: OpenAI's "Rollback" Drama and the Invisible Boundaries of AI Expansion
August 7th, GPT-5 was launched with great fanfare, offering four models (regular / mini / nano / pro). August 12th, Sam Altman announced on X that GPT-4o had once again become the default model for all paying users.
It only took 5 days from its “removal” to its “resurrection”. The last time OpenAI made such a hasty rollback was during the ChatGPT “outage incident” in November 2023. The difference is that last time it was due to a technical glitch, while this time it's a “self - correction” of the product strategy.
Backend logs obtained by VentureBeat show that GPT-5 exposed three major flaws in its first week of release:
- Route out of control: The autoswitcher misallocated 37% of Pro users' requests to the nano model, causing long texts to “lose memory” directly.
- Performance drift: In the code completion scenario, GPT-5's pass rate was 8.7% lower than that of GPT-4o, and there were a lot of complaints in hot posts on Stack Overflow.
- Emotional rift: There were 12,000 posts on Reddit r/ChatGPT in one day complaining that “the new version has no soul”.
So, OpenAI made an “emergency rollback of the default model” to stop the bleeding. Altman's promise sounded like a consolation: “If we remove GPT-4o again in the future, we will give sufficient notice in advance.”
But translated into industry jargon, it means — GPT-5 isn't ready to fully take over the production environment.
Users' “model attachment disorder”: The first “fan - club phenomenon” of AI products
You may hardly imagine that a large - scale model can have its “white moonlight”.
- Independent developer Alex posted his VSCode plugin on Twitter, saying that GPT-4o's code style was “like an old and well - matched partner”.
- A Japanese illustrator printed out GPT-4o's responses and compiled them into a book titled “Poetry Collection of 4o”.
- Some users even launched a petition on Change.org, demanding the permanent retention of GPT-4o's “personality parameters”.
This isn't a joke, but the “model personality stickiness” that OpenAI's product team has only recently realized. When LLMs become the daily production tools for millions of creators, their “tone” is productivity itself.
Altman wrote in the internal Slack: “We underestimated users' sensitivity to ‘personality consistency’.”
Therefore, the next version of GPT-5 will introduce a “temperature knob”:
- Warm: More amiable, similar to GPT-4o;
- Neutral: The current default;
- Balanced: In between the two, and users are allowed to fine - tune a continuous value from 0–100.
This is the first time an AI product has a “skin system” — not to change colors, but to change souls.
Hidden challenge: The “electricity bill” of the reasoning mode
How expensive is GPT-5's “Thinking” mode?
- With a context of 196k tokens, the cost per round is approximately 3.6 times that of GPT-4o;
- There is a weekly limit of 3000 requests, which is equivalent to about $60 per week in US dollars;
- If the limit is used up, it will automatically downgrade to Thinking - mini, and the accuracy will be further reduced by 20%.
This is just for the consumer side. The price list for enterprise APIs is even more eye - catching:
Electricity costs, graphics cards, and carbon emissions — these three mountains make “infinite context” a luxury. According to the closed - door data leaked from the Bit.ly/4mwGngO salon:
- The 200,000 H100 cluster reserved by Microsoft Azure for GPT-5 has a peak power consumption of 120 MW, equivalent to 8% of the residential electricity consumption in San Francisco.
- For every 10ms reduction in reasoning latency, an additional 5% of electricity is consumed.
The VP of OpenAI's infrastructure admitted in a non - public meeting: “The growth rate of reasoning costs exceeds the decline rate of Moore's Law.”
Efficiency vs. Expansion: The ‘triple point’ of the Scaling Law
In the past five years, the AI industry has believed in the principle of “The larger the parameters, the stronger the performance”. Now, we have touched the triple critical point of “Expansion - Efficiency - Sustainability” for the first time:
- Parameter expansion: GPT-5 has 4T parameters, and one training session costs $320 million;
- Reasoning efficiency: Sparsification, MoE, and 4 - bit quantization can only offset 60% of the cost increase;
- Sustainability: AI training already accounts for 4% of the new load on the US power grid, and environmental organizations have started to sue data centers.
As a result, three new paths have emerged in the industry:
- Model slimming: Mistral - Medium - 122B approaches GPT-4 on MMLU, and its training only costs $150 million;
- Hardware customization: Google's TPU v6 and Amazon's Trainium2 have increased the “computing power/watt - hour” by 2.3 times;
- Energy arbitrage: Moving data centers to areas with Norwegian hydropower or Saudi solar power can reduce electricity costs by 40%.
In a nutshell: “Bigness” is no longer the only selling point, and “saving” is the core of the next round of financing stories.
OpenAI's “multi - threaded” future: One press conference, three business models
Viewing the GPT-5 fiasco and rollback in the context of OpenAI's overall business picture, it's actually a “synchronous test” of three revenue curves:
This incident has reordered the priorities of the three curves:
- Consumer side: Ensure the user experience first, then talk about upgrades — rolling back to 4o is a safety net for subscription revenue;
- API side: Ensure profits first, then talk about scale — the high pricing of the Thinking mode is an insurance for ROI;
- Hardware side: Ensure energy efficiency first, then talk about expansion — Joint optimization projects with NVIDIA and AMD have already been launched.
When AI enters the era of “meticulous cultivation”
The failure of GPT-5 is very similar to the removal of the headphone jack on the iPhone 7 in 2016:
- Users complained a lot, but AirPods opened up a new market worth hundreds of billions;
- Now, OpenAI is telling the world with a “model rollback” that The era of the wild expansion of the Scaling Law is over.
The era of meticulous cultivation has begun.
In the next 12 months, we will see:
- More models that are “smaller in size, faster in reasoning, and lower in energy consumption”;
- More knobs for “adjustable personality, adjustable cost, and adjustable security”;
- More new SaaS packages that “factor electricity costs into product pricing”.
AI is no longer a black box that “achieves miracles with brute force”, but an engineering business that “cares about every detail”. This time, even Altman admits:
“Our enemy is not the competitors, but the laws of physics.”
This article is from the WeChat official account “Shanzi”. Author: Rayking629. Republished by 36Kr with permission.