Faith and Breakthrough: A Preview of AI Trends in 2026
No one could have imagined that when ChatGPT reached its third anniversary, instead of celebrations and commemorations, there was an internal red alert, once again sounding the war drum for the white - hot competition in artificial intelligence. Threatened by the amazing effects of Gemini 3, OpenAI accelerated the launch of GPT 5.2, investing more resources to achieve an overtaking in multiple indicators. However, over the past three years, the performance gap and paradigm differences among major models have continued to narrow, and there have been many skeptical voices in the industry, suggesting that the development of large models is hitting a ceiling. Nevertheless, many people firmly believe in the arrival of AGI, and the industry is full of more debates and divisions.
Standing at the end of 2025 and looking back on the journey, from the popularity of DeepSeek, the prevalence of Ghibli - style animations after GPT4o, Sora2's appearance with Sam Altman, to the various explanations of Doraemon - like images generated by Google's Nano Banana. Sometimes, there is a feeling of having passed through a long time. A technology from this year seems to have been a trend from many years ago.
Looking forward to 2026, we not only feel the anxiety about the intelligence bottleneck of large models and the uncertainty of investment returns, but also see more non - consensual views, as well as people's perseverance and beliefs. There are also hopes of breakthroughs in multiple directions, and more expectations and explorations are coming our way.
Belief
1. The Scaling Law Drives Continuous Evolution towards AGI
Since the emergence of ChatGPT, the mainstream in the industry has believed that as long as computing power is continuously increased, data is expanded, and parameters are stacked up, the intelligence of machines will grow like a physical law until it reaches the singularity of AGI.
However, as the upgrade of large - model intelligence has gradually slowed down in the past two years, and with views such as the theory of data depletion, the doubts about the Scaling Law have grown louder. Is the Scaling Law a ladder to the altar, or a Babel Tower built by humans in the maze of mathematics and statistics that is destined never to be completed? Regarding this, Gary Marcus believes that large models do not really understand the world but only fit the language correlations in a huge amount of corpora. True intelligence should include abstraction, causal modeling, symbolic reasoning, and long - term memory. Recently, Ilya said in a podcast that the Scaling Law is approaching its limit. Although reinforcement learning consumes a huge amount of computing power, it cannot be regarded as a real expansion. Future breakthroughs will come from better learning methods rather than simply expanding the scale.
Ilya's view makes sense because what is truly needed is not scale but good methods to solve problems. However, in the context of no breakthrough innovation in the underlying architecture and no revolutionary change in the training method, the Scaling Law is still a viable path. From the perspective of engineering and industrial logic, the Scaling Law is still the most reliable and practical growth path at present. Its advantages are as follows: First, the improvement of capabilities is predictable. The capabilities of the model can be predicted by increasing training FLOPs and optimizing data. Second, industrial investment can be evaluated. Elements such as computing power, algorithms, and data can be linearly expanded. Third, the talent and engineering system do not need to be completely rebuilt. It can be continuously iterated on the basis of the original architecture through engineering and algorithm optimization.
Since November, the excellent performance of Gemini 3 after its release and the research on DeepSeek V3.2 have both confirmed that the Scaling Law is still effective at this stage. This also gives the United States' vigorous AI new infrastructure more confidence. The total installed capacity of large - scale data center projects currently planned and under construction in the United States has exceeded 45 gigawatts (GW). This construction boom is expected to attract more than $2.5 trillion in investment. Regarding future computing power requirements, Jensen Huang also put forward three views on the Scaling Law, believing that the Scaling Law exists in the pre - training, post - training reinforcement learning, and inference processes, thus supporting the continuous growth of computing power.
The AI Scaling Law proposed by Jensen Huang in an interview with Bg 2 Pod
Data is the most urgent problem in the current evolution of large models. Since computing power does not currently pose a major bottleneck and the parameter scale can continue to be enlarged, high - quality and available data remains scarce. The industry is exploring a systematic method to expand data. At present, there is a certain degree of consensus: instead of simply looking for more Internet corpora, a data generation system that can be scaled up should be constructed through synthetic data, inference process data, reinforcement learning data, environmental feedback data, multimodal data, and embodied data. The hope is not only to passively collect data but to build an ability for engineering, control, and large - scale production. And through better learning algorithms, the learning efficiency can be further improved.
In the foreseeable future, it will be the era of the New Scaling Law, which will not only be a simple stacking of computing power but will expand in both the directions of quantity increase and quality improvement. Coupled with the abundant support of computing power resources, researchers will have a large amount of resources to explore more possible optimization paths for algorithms and architectures, which is expected to bring about breakthroughs in underlying capabilities. And AGI is likely to come from the combination of scaling and structural innovation, including world models, new efficient training architectures, embodied intelligence, long - term memory mechanisms, tool - based execution links, and higher - level alignment systems.
2. The Arrival of the Multimodal ChatGPT Moment, Expected to Drive a Non - linear Leap in Intelligence
Multimodal models such as Google Gemini and OpenAI Sora can already well summarize text content and extract and generate vivid PPTs, podcast content, and video animations, achieving a deep understanding of content. It can be said that the multimodal ChatGPT moment has arrived. If we draw an analogy to the process of biological evolution, language is actually a high - level form of intelligence. However, the breakthrough of large models in this wave started from language, which is exactly the opposite of the path of biological evolution. In the future, the progress of multimodal technology can explore the evolution of intelligence from another direction and is very likely to become one of the key factors driving a non - linear leap in AI intelligence levels.
Looking back at the history of biological evolution, we can find that intelligence is not an abstract ability that suddenly appears but the result of the gradual emergence of a more complex perception and action system. Among them, the emergence of vision is widely regarded as a crucial watershed. In early life forms, photosensitive cells could only distinguish between light and dark, while the emergence of imaging vision enabled organisms to recognize spatial structures, object boundaries, and movement relationships. This change directly expanded the scope of the world that organisms could perceive and act in, and the complexity of predation and evasion behaviors increased sharply. The nervous system was also forced to evolve stronger processing and decision - making abilities. In the end, vision is not just an additional sense but triggered a stage - by - stage leap in cognitive ability and intelligence level.
More than 500 million years ago in the Cambrian period, the "eye" organs began to appear, and the evolution speed of animals increased significantly
For a long time, large language models have mainly learned about the world in the text space. Their understanding is not real understanding, and their perception is not real perception. In essence, it comes from the high - level compression and abstraction of reality by language. Although this method has shown amazing language reasoning and knowledge integration abilities, it always faces a fundamental limitation, that is, the world that the model contacts is a second - hand world filtered, described, and reconstructed by humans. There is a vivid metaphor: although large models can vividly describe the aroma and taste of red wine, they have never taken a sip of red wine or knocked over a wine glass.
The progress of multimodal models has the opportunity to change this premise to a certain extent. Modalities such as images, videos, and voices are not interpretations of the world but direct projections of the world state. They naturally contain spatial continuity, time evolution, and implicit physical constraints, such as object constancy, occlusion relationships, movement trajectories, and causal sequences. This information is difficult to be fully expressed in text but exists in multimodal data in a passive but mandatory way. When the model learns multimodally, it has to face a structural constraint space closer to the real world, which provides the possibility for the formation of a more robust world model.
More importantly, multimodality opens up the possibility of a closed - loop technical channel of "perception - decision - action" for artificial intelligence. When multimodal perception is combined with tool use, robot control, software operation, etc., intelligence will no longer be limited to answering questions and generating content but can try, correct, and plan in the environment, and continuously optimize through feedback to achieve a leap in intelligence.
3. Research and Exploration Blossom in Multiple Fields such as Underlying Architecture and Learning Paradigms
For the large - model industry, research - driven has always been the core paradigm. A large number of experiments are essential in R & D. Conducting experiments in multiple directions simultaneously in small teams has always been an effective organizational method for leading institutions such as OpenAI. This model, which is somewhat like a horse - racing mechanism, is very suitable for the field of large models where the routes are still constantly iterating and changing. It is expected that in the new year, more breakthrough results are expected to be born in multiple fields such as underlying architecture, training paradigms, evaluation methods, long - term memory mechanisms, and Agents.
In the past two years, a number of non - consensual and highly technically distinctive laboratories have emerged globally. Including SSI, led by Ilya, which focuses on safety and has attracted $3 billion in investment, concentrating on safe super - intelligence; the Thinking Machines Lab established by Mira, the former CTO of OpenAI, which focuses on solving problems such as the reliability, customizability, and multimodal collaboration of AI systems, and has just launched its first product, Tinker, which can help developers and researchers fine - tune language models. In the direction of combining the physical world with agents, the World Labs founded by Fei - Fei Li focuses on spatial intelligence, aiming to enable AI models to understand three - dimensional environments and physical laws, trying to fill the gap in the physical interaction level of large language models. After leaving Meta, Yann LeCun will join the AI startup AMI, which focuses on advanced machine intelligence, with the goal of building a system that can understand the physical world, has long - term memory, can reason, and plan complex action sequences. In the direction of agents, the H Company in Europe believes that if AI cannot continuously solve complex real - world problems, then even the smoothest dialogue ability is only superficial intelligence. It focuses on researching cognitive systems that can continuously solve complex tasks, hoping to create super - agents that can operate tools and execute complex workflows like humans.
In terms of innovation in underlying architecture and training paradigms, there have also been many promising studies in the industry. Sakana AI in Japan is a laboratory that clearly stands on the opposite side of the mainstream large - model Scaling Law route. It was founded by several former core researchers of Google DeepMind, including Llion Jones, one of the main authors of the Transformer paper. They advocate evolution and swarm intelligence, exploring efficient paths to reduce computing power dependence. One is the evolutionary model (Evolutionary AI), which does not pursue training a perfect model at once but allows the model to continuously evolve in a dynamic process through mutation, selection, and combination. The other is swarm intelligence and multi - model collaboration, which regards multiple models with complementary capabilities and different structures as an ecosystem, generating stronger overall intelligence through collaboration rather than individual optimization. Liquid AI, with a background from the Massachusetts Institute of Technology, has developed a liquid neural network architecture, which is a fundamental reconstruction of how neural networks should operate. They believe that a truly general and robust intelligent system should change with the environment, rather than being a pre - trained system that is trained once and frozen for life. This is the meaning of the term "liquid", that is, the network is not a solid - state structure but a dynamic system that can continuously evolve. In terms of long - term memory, Google has proposed the concept of nested learning, trying to solve the problem of catastrophic forgetting at the root. Just like the human brain, short - term memory (hippocampus) and long - term memory (cerebral cortex) work together. Google has designed a fast - slow system. The inside of the model is designed with different functions. Some parts are specifically responsible for quickly adapting to current new tasks, while other parts are responsible for consolidating general knowledge through long - term memory.
Evaluation - led is increasingly becoming an important paradigm to drive the R & D of large models. Currently, data pollution caused by static leader - board brushing, the high cost and difficulty of scaling human annotation, and the fact that model capabilities in some dimensions have begun to surpass ordinary evaluators all pose significant challenges to large - model evaluation. The industry is exploring more new evaluation methods. For example, in the evaluation of Agents and long - term tasks, the academic and industrial circles are building an evaluation system that requires cross - step, cross - tool, and cross - state management around Agent capabilities. This includes DeepMind's complex task planning environment, OpenAI's internal multi - tool collaboration tasks, and the academic SWE - bench, WebArena, AgentBench, etc. These evaluations no longer care whether the model answers a question correctly but whether it can achieve the goal, correct errors, and update strategies over a long - time scale, truly exposing the shortcomings in planning and memory. Another example is dynamic, interactive, and simulation environment evaluation. Representative explorations include evaluations based on games, simulated worlds, or digital twin environments. Each decision of the model will change the subsequent state, and errors have a cumulative effect. Of course, the challenge of evaluation may be a long - term problem because evaluation indicators themselves are prone to fall into the trap of Goodhart's law, that is, when an indicator becomes a target, it is no longer a good indicator.
Schematic diagram of the large - model evaluation system framework
4. Simulation Data Will Shine in Physical AI
The physical - world data of robots is extremely scarce. Especially for complex dexterous operations, it may take at least a few minutes to collect one piece of data from a real machine, with a cost of $1 - 10. In contrast, the marginal cost of generating one piece of data through simulation approaches zero, and tens of thousands of instances can run in parallel. Therefore, in the early - to - mid - stage R & D and within the scope of controllable - environment tasks, simulation data will become the absolute mainstream, and the gap between Sim - to - Real is being filled by generative AI.
In terms of scale and coverage, the bottleneck of real - machine data collection is not the inability to collect but the slowness, high cost, and insufficient breadth of collection. Simulation can cover long - tail scenarios such as extreme lighting, occlusion, collision, rare faults, different friction, mass, and joint clearances with exponential low - cost advantages. In terms of controllability and reproducibility, physical AI R & D requires rigorous regression testing and safety verification, and simulation can lock variables, turning problem localization from a mystery into a diagnosable problem. In terms of cross - ontology migration, real - world data is often tied to a specific hardware ontology, a set of sensors, and calibration, while simulation is naturally suitable for the unified generation and alignment of multiple ontologies, multiple observations, and multiple action spaces, which is particularly crucial in the training paradigms of multiple robots and multiple tasks. In this regard, the industry, academia, and research community have conducted some research and practice. For example, the Shanghai Artificial Intelligence Laboratory has constructed the synthetic dataset InternData - A1, which contains more than 630,000 trajectories and a total of 7,433 hours of data, covering 4 embodied forms, 18 skills, 70 tasks, and 227 scenarios, involving the manipulation of rigid, articulated, deformable, and fluid objects. Using the same architecture as π0 and pre - training a model entirely on InternData - A1, it was found that the model's performance on 49 simulation tasks, 5 real - world tasks, and 4 long - term dexterous operation tasks was comparable to that of the official π0 model, verifying the effectiveness of simulation data. Galaxy Universal has released the dexterous hand functional grasping synthetic big dataset -