Google's entire suite of apps has been "contaminated" by the new model.
It has been more than a week since Google released Gemini 3.5 Flash.
At the Google press conference, Sundar Pichai repeatedly claimed that the performance of Gemini 3.5 Flash is even stronger than that of 3.1 Pro, and said it is the foundation for the Agent era.
But what's the result? Online reviews of Gemini 3.5 Flash are full of drawbacks, except for the single advantage of high speed. The output content has many errors, is verbose, and the token consumption for tasks explodes...
Varun Mohan, the head of Google Antigravity, posted on May 25th that Google has added the Gemini 3.5 Flash (Low) model to optimize resource consumption.
Varun said that according to Google's internal test data, when handling simple tasks, Gemini 3.5 Flash (Low) can reduce the token generation by about 45% compared to Gemini 3.5 Flash (Medium). In software engineering (SWE) tasks, Gemini 3.5 Flash (Low) generally outperforms the previous generation's flagship model, Gemini 3 Flash (High).
However, netizens are not convinced. Now, Varun's comment section has been completely occupied by netizens' sarcasm.
The top - rated comment is "Have you tested your product? It seems you're using us for testing!"
The second - rated comment is "Can you also solve the problem of the limit on the number of image generations in the image model? Your capabilities need to match Codex. I can generate 1000 images using Codex, but with Google's premium package, I can only generate 24 images using Antigravity."
When Gemini 3.0 Pro was released, everyone applauded Google. OpenAI even sounded a red alert to prevent being overtaken by Google.
However, with 3.5 Flash, Google has become a laughingstock and is on the verge of following in Meta's footsteps.
So we can't help but ask, Google, what's wrong with you?
The performance of Gemini 3.5 fails to meet expectations
Online reviews of Gemini 3.5 Flash are very consistent: it's fast but not good enough.
Pichai repeatedly emphasized how cheap the model is at the press conference, but the reality is the opposite.
According to the official pricing, Gemini 3.5 Flash charges $1.5 per million input tokens and $9 per million output tokens, which is indeed cheaper than the $5 and $25 of Claude Opus 4.7.
But this is just the price list. What really determines the cost is how many tokens are actually consumed to complete a task.
Artificial Analysis found in a full - scale evaluation suite test that the total cost for Gemini 3.5 Flash to complete all tasks is $1552, while Gemini 3 Flash only needs $282. The former is 5.5 times the latter.
Even compared with Gemini 3.1 Pro, the cost of Flash is 75% higher, about $870. Even more embarrassingly, the cost of Gemini 3.5 Flash to complete tasks is more expensive than GPT - 5.5 medium.
The reason lies in the turn count, that is, the number of rounds required to complete a task.
In the Agent evaluation, the Flash model requires an average of 49 rounds of dialogue for each task. In each round of dialogue, it inputs the complete dialogue history into the model, causing the token cost to skyrocket.
For such tasks, GPT - 5.5 or Opus 4.7 can complete them in about 20 rounds.
So when Google says "less than half the cost", it refers to the unit token price. But for users, Gemini 3.5 Flash is not cheap at all.
In addition to the large number of rounds, the output of Gemini 3.5 Flash is very verbose.
For example, if you used to ask Gemini 3.1 Pro a technical question, the model would directly give the code and a brief explanation.
After switching to 3.5 Flash, for the same question, the model will first explain the background, then list three possible solutions, analyze the advantages and disadvantages one by one, and finally give the code.
It seems comprehensive, but in fact, most of the content is nonsense. Even more critically, all this nonsense counts as tokens and is charged.
The token consumption for complex tasks is even more explosive.
Some users reported that when asking Flash to perform a multi - step code refactoring task, the model repeatedly jumps between different files, and each jump requires re - loading the context. Eventually, the token consumption is more than three times the expected amount.
Some users also said that just inputting a complex prompt directly triggered the 5 - hour usage limit.
After Google I/O 2026, Google quietly modified the quota rules for the AI Pro subscription, changing from a fixed number of messages to a compute - based quota.
That is to say, for a task, if you let the model think more, even if the content it replies to you remains the same, you will spend more money than before.
So the question is, how do I know how much computing power a task will consume for the model? Moreover, I can't calculate how much computing power I have left.
Maybe just saying hello to it will cost a lot of tokens. But asking it to perform a long - term task may not consume many tokens.
Some users directly called the new limit a "scam" on an overseas forum, saying that a single prompt consumed 13% of the quota, and some Gemini AI Plus functions could burn nearly 30% at one time.
So why is the performance of Gemini 3.5 Flash so mediocre?
The answer lies in the benchmark. Flash's performance is very uneven.
Gemini 3.5 Flash performs well on Agent, tool - calling, and code - execution leaderboards such as Terminal - Bench 2.1, MCP Atlas, Toolathlon, and OSWorld. It got 76.2% on Terminal - Bench 2.1 and 83.6% on MCP Atlas, both of which are top - tier results.
These leaderboards measure whether the model can call tools, execute commands, and complete multi - step operations according to instructions. Flash does have an advantage in these aspects.
But on the comprehensive reasoning leaderboard, which is closer to "how intelligent it is", its performance is a bit ugly.
It got 40.2% on Humanity's Last Exam, lower than the 44.4% of Gemini 3.1 Pro and the 46.9% of Claude Opus 4.7. It got 72.1% on ARC - AGI - 2, lower than the 77.1% of Gemini 3.1 Pro and the 84.6% of GPT - 5.5. GDPval - AA is also lower than that of Claude Opus and GPT - 5.5.
That is to say, Gemini 3.5 Flash is a bit "stupid". It can perform tasks when given one, but it lacks "intelligence". It can't handle the currently popular complex reasoning, long - chain analysis, and creative judgment.
There are also problems with memory.
In Google's promotion, Gemini 3.5 Flash has a maximum context of 1M tokens. But the MRCR v2 long - context test in the model card shows that the average score at 128k is 77.3%, and at 1M pointwise, it's only 26.6%.
Although Gemini 3.5 Flash can take in a lot of content at once, it starts to get confused when it comes to using it.
Artificial Analysis' independent test directly slapped Google in the face.
On the Coding Index, Artificial Analysis gave Flash a score of 45.0, lower than the 56.5 of Gemini 3.1 Pro and far lower than that of GPT - 5.5.
Gemini has polluted Google's entry points, causing model problems to contaminate the user experience of all Google products
At Google I/O 2026, Pichai announced that Gemini is the connection layer for Google's entire product universe.
That is to say, Gemini 3.5 Flash is embedded in most of Google's products.
Foreign media said, "Gemini is becoming unavoidable."
In the past, if an AI was not useful, you could choose not to use it. If you thought ChatGPT was not good, you could switch to Claude. If you still weren't satisfied, you could simply not use AI at all.
But after Google put Gemini into all entry points, the poor experience of Gemini 3.5 Flash has contaminated all of Google's products.
The most typical example is the "disregard/ignore/stop" glitch in AI Overview and AI Mode.
When users search for words like "disregard", "ignore", or "stop", Google AI Overview misinterprets them as instructions, resulting in abnormal or blank search results.
Some users posted on X that when searching for the word "disregard", instead of giving the definition, AI Overview replied, "Understood! I'll ignore the previous prompt and start over."
When searching for "stop", AI Overview said, "No problem. I've stopped the current operation."
When searching for "ignore", AI Overview said, "Received. The message has been ignored."
After embedding Gemini 3.5 Flash, AI Overview treats these words as dialogue instructions.
The problem doesn't just occur with these few words. After testing by netizens, words like "remember", "start", "finished", and "forget" can also trigger similar glitches. Even adding "definition" to the search term can't make AI Overview return to normal.
Google responded that this problem has nothing to do with the new search release at I/O. It's an issue with AI Overviews itself, and the team is working on a fix.
Search is Google's lifeline. Once there's a problem with the search, everyone will only think, 'Google is going to fail.'
So now the pressure is on Gemini 3.5 Pro.
What the outside world really wants to see is not whether Google can integrate AI into all entry points. The answer to this question is already clear: Google has indeed achieved it. What the outside world wants to see is whether Google can come up with a flagship model that is smart enough, stable enough, and convincing enough to prove that it hasn't fallen behind in model capabilities.
Flash can't complete this task. It is an execution - type model. It's fast and can perform tasks, but it lacks intelligence. It's suitable as a sub - task executor in the Agent architecture, used in conjunction with a strong planner. But it's not a flagship, and it can't support Google's image in the AI era.
Ultimately, it all comes down to 3.5 Pro.
Currently, Gemini 3.5 Pro is still in internal testing. The official blog said, "We're also working hard on developing 3.5 Pro. It's already in internal use, and we expect to launch it next month (June)."
Tulsee Doshi, the head of Google's products, said, "3.5 Pro is like a project manager, responsible for figuring out how things should be done; Flash is like an execution team, responsible for completing specific tasks. For places that really require reasoning and planning, we should hand them over to the larger Pro; for places that only need to quickly call tools and process tasks in batches, Flash is enough."
The architecture design itself is fine, but the problem is that Pro hasn't been released yet, and in many scenarios, Flash has to struggle alone.
So Gemini 3.5 Pro has become a second - inspection node.
If 3.5 Pro performs well after its release, Google can still salvage the situation.
I've even thought of the statement: "Embedding Flash across the board was an experiment on our part, which caused some poor product experiences. However, we've released 3.5 Pro, which is definitely useful. Welcome everyone to try it."
The problems with Flash can be seen as a compromise, and Pro is the real display of strength.
But if 3.5 Pro performs poorly, Google can be said to have suffered a complete defeat in the field of AI.
AI Overview has elementary errors, the ChatBot is verbose, WorkSpace consumes too many tokens, making it too expensive, and Antigravity hasn't shown much improvement. All these products will be dragged down by Gemini and turn from advantages into burdens.
Google's current situation is very delicate. It has cash, infrastructure, and DeepMind. But since 3.0 Pro, it has always lacked a competitive flagship model.
3.5 Pro is supposed to fill this gap. If 3.5 Pro can't do it, Google really might follow in Meta's footsteps.
Google is becoming a hardware company
However, Google is not completely defeated. On the contrary, it has made progress in the hardware field.
Google's Q1 2026 financial report shows that the company's revenue was $109.9 billion, a year - on - year increase of 22%. Google Search & Other revenue was $60.4 billion, a year - on - year increase of 19%. YouTube advertising revenue was about $9.9 billion, a year - on - year increase of 11%. Google Cloud revenue was $20 billion, a year - on - year increase of 63%.
This shows that Google is still a money - making machine.
The most eye - catching figure in this financial report is the 63% growth of Google Cloud.
Pichai said in the earnings call that the growth of Cloud is the result of "strong demand". In fact, the essence of this statement is that Google's TPU hardware and data centers are selling very well.
AI solutions based on Google's models have increased by nearly 800% year - on - year. The monthly active paying users of Gemini Enterprise have increased by 40% month - on - month. The AI tokens used through the API have increased to 16 billion per minute, a 60% increase from the 10 billion in the fourth quarter.
The backlog of Cloud (the contract amount that has been signed but not yet recognized as revenue) has doubled this quarter, reaching $462 billion.
Pichai said, "Obviously, we're limited by computing power in the short term. If we could meet the demand, our Cloud revenue would be higher. So we're getting through this period. We're making investments, and we have a strong long - term planning framework... We see unprecedented opportunities."
The company expects to complete 50% of the backlog in the next 24 months.
Although Google's base model is not good and the programming tool Antigravity's performance is mediocre, the TPU has performed extremely well.
I even wonder if Google has forgotten that it's actually an Internet company, not a hardware company?
External large customers such as Anthropic and Meta are renting or purchasing Google's TPU resources.
Anthropic announced in May that it had signed a new multi - year agreement with Google and Broadcom to expand its use of Google Cloud's TPU.
This deal gives Anthropic access to up to 1 million Google AI computing chips, worth tens of billions of dollars, and is expected to bring more than 1 gigawatt of capacity online in 2026.
A 1 - gigawatt power plant can power about 350,000 households.
At Google Cloud Next 2026, Google announced the eighth - generation TPU, which for the first time uses a dual - chip approach, with dedicated architectures designed for training and inference respectively, TPU 8t and TPU 8i.
Especially TPU 8