The cost of intelligence has plummeted by 128 times in one year. In 2026, everything can be an Agent. What can humans do?
A few months ago, Artificial Analysis was still warning people that the rumors about the slowdown of AI progress were greatly exaggerated!
But by early 2026, hardly anyone believed the view of "AI progress slowdown" anymore, because it has been accelerating all the time.
At the beginning of 2025, there was not even a real "Code Agent" in the world.
But by the end of the year, the ancient profession of software engineering was forever changed by "Vibe Coding".
Programmers no longer mechanically copy and paste code into ChatGPT or CursorChat. Instead, they start to skillfully give instructions to Agents and watch them work independently for several minutes or even longer.
This is just one aspect of the crazy reality revealed to us by the "2025 Global AI Year - End Report" just released by Artificial Analysis.
In the past year, the global AI arms race not only showed no sign of cooling down but became completely white - hot.
Of course, such a fierce competition also brings an extremely exciting benefit to AI users:
The cost of using AI at all intelligence levels is dropping at an incredible speed.
AI labs are focusing on reinforcement learning, engaging in intense competition in the large - scale sparse Mixture of Experts (MoE) architecture. Coupled with the arrival of NVIDIA's Blackwell hardware, they have jointly pushed up the wave of this technological tsunami.
This report from Artificial Analysis believes that in 2025, five core trends completely reshaped the framework of the entire AI industry:
Extremely crowded competition: The track has become unprecedentedly fierce, with new players going head - to - head with international giants.
Reasoning becomes an absolute standard: Thinking - type models dominate the world.
Agents experience a full - scale explosion: They move from single - instruction tasks to long - cycle autonomous tasks.
Multimodality crosses the singularity: Video generation and image editing have entered the mainstream awareness.
Native voice awakens: End - to - end large - scale voice models give real souls to voice Agents.
"Thinking" becomes an absolute standard, and the cost of peer - level reasoning drops by 128 times!
Just at the beginning of 2025, OpenAI's o1 was still leading alone and was the only reasoning - type player in the market.
But by the end of the year, the situation changed drastically. Almost all top AI labs had come up with their own "thinking - type" reasoning models.
This paradigm shift directly occupied the highest intelligence list that humans could see.
OpenAI still retained the title of "the smartest brain" with GPT - 5.2 (xhigh) at both the beginning and the end of the year.
However, the leading edge of this former industry leader is being rapidly compressed.
Anthropic is closely following with Claude4.5Opus (Reasoning), Google has launched Gemini3Pro, and xAI is also not showing any weakness.
The good news for ordinary users from the AI arms race is: Smart no longer means expensive.
Due to the continuous reduction of model size and the extreme improvement of software and hardware efficiency, the cost of using the o1 - level intelligence that we had to worship at the beginning of 2025 has dropped like a free - falling object, a full 128 times in just one year!
Now, with the same budget, we can summon super - brains far beyond the past, or popularize the top - level intelligence of the past at a very low cost.
From "copy - paste" to "autonomous work", in 2025, Agents finally became a reality
In 2025, Agents completed the leap from single - test toys to enterprise - level core productivity.
Also in this year, our expectation of AI finally changed from "you give me the answer, and I'll do the work" to "you just finish the work".
The trigger for all this was the outstanding performance of Code Agents. Long - cycle programming tasks became the biggest beneficiaries of this productivity revolution.
Both large companies and startup teams are frantically releasing Code Agents. Today's models not only come with extremely proficient tool - calling capabilities out of the box but also have the instinct for autonomous execution of long - cycle tasks deeply ingrained through reinforcement learning.
Artificial Analysis mentioned a change in the report:
In the long workflow of Agents, it's not that the more Tokens a model outputs, the higher its intelligence.
The real top - notch players win by using various external tools skillfully and efficiently.
On this cruel Pareto frontier chart, the flagship models of Google and Anthropic have become the absolute kings in balancing efficiency and intelligence.
Since 2025 was a year of great victory for Code Agents, Artificial Analysis asserts that 2026 will completely become the first year of "Agents for everything".
Native multimodality explodes, and video models enter the "sound - enabled era"
In 2025, large - scale models witnessed a huge explosion of native multimodality.
Video models completely shed the label of "experimental product" this year and truly became mainstream and usable.
Sora, which was still highly regarded at the beginning of the year, was surpassed by RunwayGen - 4.5 by nearly 200 ELO points by the end of the year.
What's more important is that video models are no longer "mute".
Veo3, released in May 2025, was the first large - scale video model that natively supported audio generation with extremely high image quality.
Subsequently, the entire industry exploded. OpenAI's Sora2, Lightricks' LTX - 2, etc., made "video generation with built - in BGM and ambient sound" a mainstream standard.
This report also gives a significant conclusion: In the field of image and video generation, China and the United States are at exactly the same level!
End - to - end S2S reasoning explodes, and a comprehensive evolution of voice and music AI
In the fourth quarter of 2025, the voice and music AI circles experienced a real underlying revolution.
Why did previous voice assistants always seem a bit dull and mechanical? Because they had to go through an extremely cumbersome "translation" pipeline in their "brains":
First, convert the heard voice into text (STT), then hand the text to the large - language model for thinking (LLM), and finally convert the thought - out text back into voice (TTS) and read it out.
This cascaded splicing model not only has high latency but also filters out the emotions, sighs, and accents in human tones.
But in 2025, native audio reasoning technology enabled models to learn to think directly with the shapes of sound waves. They abandoned the "middleman" of text and began to process audio end - to - end.
This technological revolution directly led to a major reshuffle of the rankings.
In the fourth quarter, xAI, with its extremely fast response speed and terrifying native auditory understanding ability, forcefully overturned the former leader Google Gemini2.5NativeAudioThinking and topped the BigBenchAudio evaluation list; while Amazon's Nova2.0Sonic precisely hit the pain points of the market and crowned the king of cost - performance.
On the battlefield of speech - to - text (STT), single - function players are being outperformed by all - around "geniuses".
Multimodal large - scale models like AWS's Nova2Omni are now doing speech - to - text as if it were just an easy "side job".
They don't even need to specifically practice dictation, and their accuracy can already match that of professional software.
Meanwhile, to solve the problem that voice assistants are always half a step behind, models like ElevenLabsScribev2Realtime and NVIDIA ParakeetRealtime, which specifically focus on ultra - low latency, have emerged.
With them, the stumbling blocks for voice intelligent agents to be truly implemented in real - world scenarios have been removed.
Today's top - level models not only have pleasant voices but can also perfectly control the emotional tone, speech speed, and accents under instructions, and can extremely naturally insert laughter, sighs, and breathing sounds.
The awkward "AI flavor" that used to be noticeable has now basically disappeared.
As tools like SunoV4.5 and ElevenLabsMusic start to become popular among the public, now, with extremely low cost, it's easy to synthesize realistic human voices or music.
Of course, this kind of realism has also caused great panic. The proliferation of voice cloning has directly forced the entire industry to prioritize audio watermarking and source verification systems as the highest - level security restrictions.
The report also very moderately points out the current limitations:
Although voice Agents already behave like real people in structured interactions such as customer service and reservations, once they encounter ambiguous contexts, multi - round conversations that require long - term logical reasoning, or noisy and harsh recording environments, they still show their mechanical clumsiness.
The power game of computing power: NVIDIA's large - scale delivery and a $20 - billion "final - game gamble"
In 2025, the underlying hardware infrastructure underwent a complete and mature evolution.
NVIDIA's Blackwell chips, the B200 and GB200NVL72 rack - level systems, were fully deployed in 2025 and entered the real production environment.
Top - level large - scale models like IBM's Granite4 series and OpenAI's GPT - 5.3Codex were among the first to publicly announce the use of GB200 clusters.
Then in the third quarter, NVIDIA also launched the B300 and GB300.
This hardware upgrade was very straightforward: on the basis of the B200, not only did the HBM3e video memory increase by 50% (reaching 288GB), but the computing power under FP4 precision also increased to 14 PFLOPs.
However, NVIDIA's ambition goes far beyond just selling chips.
In December 2025, the entire tech circle was completely detonated by a deal: NVIDIA spent about $20 billion to acquire Groq.
This deal was very cleverly packaged in the mode of "IP licensing plus talent acquisition (acqui - hire)".
What NVIDIA values is to directly embed Groq's LPU reasoning technology into its own computing - power empire, trying to completely lock the throat of the AI reasoning market.
However, there is never a shortage of ambitious people in the power game.
Google's TPUv6 (Trillium) was fully deployed at the end of 2024, supporting the huge demands of Gemini2.5Pro and Gemini3Pro.
Anthropic also joined hands with Google and Amazon in 2025, connecting TPU and Trainium to its own training and reasoning matrix.
And Cerebras, which has always been underestimated, joined forces with AMD and Broadcom and signed a multi - year high - speed reasoning contract with OpenAI.
Behind this rapid increase in computing power, the workflow on the reasoning side is also quietly undergoing great changes.
With the arrival of the Agent era, the old model of simply running on a single machine no longer works.
Distributed reasoning optimization became an obvious trend in 2025.
Driven by NVIDIA's Dynamo ecosystem and the open - source community, technologies that only top - level large companies could handle in the past can now be used by ordinary development teams.
The current approach is to directly split the two steps of Prefill and Decode, allowing dozens or even hundreds of GPUs to work together in a division of labor for parallel expert computing.
In this process, the previously diverse reasoning frameworks have gradually converged, and finally, everyone's choices basically stabilized on the three mainstream frameworks: vLLM, SGLang, and NVIDIA's "in - house" TensorRT - LLM.
Embrace the upcoming zero - marginal - cost intelligence
After reading this year - end