Why major tech giants are racing to develop world models: four business scenarios and three social impacts
Previously, IT Juzi conducted a review and inventory of domestic startups and large enterprises working on world models. Many people are interested in this new concept and new thing.
In this article, let's talk about world models - what are they for?
Why are large domestic and international companies investing in R & D, and why are capital investors pouring large sums of money into world model startups?
A world model is not just another chatbot, nor is it another tool for generating pictures. It aims to evolve AI from "having seen something similar" to "understanding how the world works", thereby triggering a chain reaction in fields such as automotive, robotics, content, and industrial simulation.
Its more profound impact lies in the fact that it may transform AI from "a tool that can only talk" into "an actor that can take action", rewriting the labor market, the data industry, and even the boundary of human perception between the real and the virtual.
I. One - sentence positioning: It aims to transform AI from "having seen" to "understanding"
In the past few years, the thing AI was best at was pattern matching.
Show it ten thousand pictures of cats, and it can recognize cats; feed it one trillion words, and it can write poems, code, and reports. But if you ask it "what will happen if a glass cup falls off the table", it will probably hesitate.
Because although it has seen many scenes of cups falling, it doesn't really understand the physical relationship between gravity, inertia, and the fragility of glass.
What the world model aims to solve is this "seeming - to - understand - but - not - really" problem.
Its goal is to enable machines to have an internal "mental sandbox" to predict the consequences of actions without actually taking them.
Autonomous driving can rehearse emergency avoidance on a rainy day in this sandbox; robots can fall tens of thousands of times in it before going out; scientists can conduct thousands of virtual experiments in it and then choose the most promising directions to verify in the real world.
This ability may seem like just a cognitive upgrade, but it will be transmitted along the business chain and ultimately affect the way human society operates.
II. Business level: Four tracks are being rewritten
1. Autonomous driving: From "driving one hundred million kilometers on the road" to "driving hundreds of millions of kilometers in the mind"
The biggest bottleneck in the autonomous driving industry has never been that the algorithms are not good enough, but that the data is too expensive, scarce, and slow to obtain.
You can let the test fleet drive one hundred million kilometers on real roads, but most of the mileage is uninteresting high - speed cruising.
What's really valuable are the extremely special moments: pedestrians suddenly appearing in the rain, debris flying from a flat - tire of the vehicle in front, chaotic road markings at a construction site, and high - beams from oncoming lanes at night.
You may not encounter these scenarios more than a few times in a year.
The value of the world model is that it can infinitely generate these extreme scenarios in the virtual world.
XPeng claims that its world - model - supported simulation tests are equivalent to driving 30 million kilometers per day, and Horizon can generate a controllable driving video within 30 seconds. This means that the R & D method of autonomous driving is changing from "fixing bugs as problems arise" to "generating the scenarios you want".
The business impact is straightforward: the better the world model, the faster the iteration of autonomous driving, and the earlier one can get mass - production orders and regulatory trust.
In the future, the competition among car companies will shift from who has more lidars to who has a world model that can "dream" better.
2. Robotics: From "every action needs to be taught by humans" to "going out after falling enough on its own"
Today's industrial robots look cool, but behind each action, there are often a group of engineers repeatedly adjusting parameters.
For humanoid robots to enter factories, warehouses, and households, it's impossible to rewrite the program every time the scenario changes.
It needs a "mental training ground" to make millions of trial - and - error attempts in simulation, learn to grasp, walk, avoid obstacles, and cooperate, and then make fine - adjustments in the real world.
The world model is that training ground.
BMW has used NVIDIA Omniverse to train assembly robots in a virtual factory, controlling the error within millimeters. Domestic humanoid robot companies such as Unitree and Zhipu are also following suit. Data shows that the proportion of industrial robots using world models for auxiliary training has exceeded 60%.
Commercially, this means that the deployment cost of robots may drop precipitously.
Today, the debugging cycle of an industrial robot production line may be 3 to 6 months, and it may be shortened to a few weeks after the world model matures. The schedule for service robots to enter households may also be advanced by 5 to 10 years.
3. Content industry: From "spending 200 million to make a movie" to "typing to generate a world"
This is the closest track to ordinary people.
Sora and Genie 3 have shown the prototype: input a piece of text, and you can generate an explorable 3D world.
It's still very rough today, but if the world model continues to improve, future film and television production, game development, virtual social interaction, and cultural and tourism experiences will all be rewritten.
Imagine opening an app and inputting "a cyber - punk - style rainy - night city, and I'm a private detective". The world model will generate an interactive city for you. You can walk in, talk to NPCs, trigger storylines, change the weather, and even influence the city's operation.
This is more free than current open - world games because the world is generated in real - time, not pre - set by designers. Short dramas, virtual companionship, and cultural and tourism meta - universes are all different aspects of the same technological ability.
The commercial ceiling of this track is extremely high. The combined global entertainment market of games and film and television is in the trillions of dollars.
If the world model can reduce the content production cost by an order of magnitude, it will give rise to new giants and reshuffle the existing content companies.
4. Scientific and industrial simulation: From "spending hundreds of millions to build a wind tunnel" to "running physical experiments on the computer"
The underlying ability of the world model is to simulate the evolution of the physical world.
This ability not only serves autonomous driving and robotics but can also be used in scientific research and industrial simulation. Climate prediction, material design, drug molecular dynamics, architectural design, aerospace - all fields that require "repeated trial - and - error in a safe environment" are potential applications of the world model.
NVIDIA Omniverse has been implemented in the fields of architecture, engineering, and construction, enabling design teams to collaborate and verify in a virtual environment.
The medical field is also exploring using the world model to simulate the dynamic changes of the human physiological system to assist in diagnosis and treatment plan optimization.
This direction is still in its early stage, but the imagination space is huge.
III. Human society level: Three more profound impacts
1. AI transforms from "tool" to "actor"
Language models make AI eloquent, but it can only output text and pictures and cannot directly change the physical world. The world model is the bridge that enables AI to go from "being able to talk" to "being able to act".
When AI has a model of the world in its mind, it can predict the consequences of actions and then truly take action - control cars, operate robots, and manage factory production lines.
This means that the role of AI in human society will change from "assistant" to "actor". You will no longer just ask ChatGPT "help me write a plan", but tell the robot "help me assemble this part", and it will plan the actions, predict risks, and execute tasks on its own.
This transformation will profoundly change the structure of the labor market.
The speed at which repetitive physical labor is replaced by automation may be much faster than we expected.
2. The definition of "data" is rewritten: Simulation data may be more valuable than real - world data
Today, the core resource in the AI industry is real - world data - web text, pictures, videos, driving logs, and sensor records. After the world model matures, simulation data will become a new type of "mineral".
You don't need to wait for an accident to happen in the real world; the world model can generate countless accident scenarios; you don't need to actually break a hundred robots; the world model can simulate breaking them ten thousand times.
This will change the power structure of the data industry.
Companies that own world models are equivalent to having a "data printing press". The company with the best physical consistency and highest - quality generation in its world model can produce training data at the lowest cost, forming a data flywheel.
This may be the core battlefield of the next round of AI arms race.
3. The boundary of "reality" itself begins to blur
When the world model can generate a sufficiently realistic 3D environment and AI can train strong enough abilities in the virtual world, the time humans spend between the virtual world and the real world may be re - allocated.
Today, we already spend a lot of time on our mobile phones. In the future, VR/AR combined with the world model may make immersive virtual experiences a daily routine.
This brings two directions.
Looking at the positive side, the accessibility of education, medical care, and entertainment will be greatly improved - children in remote areas can "walk into" ancient Rome, and patients with phobias can undergo exposure therapy in a safe environment.
Looking at the worrying side, when AI can generate an infinitely realistic world, the boundary between "real" and "false" will be more difficult to distinguish, and problems such as deepfakes, information manipulation, and virtual addiction will become more intractable.
Society needs a new governance framework to deal with this.
IV. What stage is it at now?
The world model is now roughly equivalent to deep learning in 2012. AlexNet proved that deep learning could win ImageNet that year, but no one foresaw ChatGPT. Today, Genie 3 and Cosmos prove that the world model can generate an interactive world, but no one knows what the "world - model version of ChatGPT" in 2030 will look like.
What we can be relatively certain about is that language models have enabled AI to enter the information world, and the world model will enable AI to enter the physical world.
The commercial value of the former has been verified, and the commercial ceiling of the latter may be higher. The vast majority of the global GDP still comes from fields such as manufacturing, transportation, construction, energy, and healthcare that require "hands - on" work.
So, what's the use of researching world models?
In the short term, it can accelerate the implementation of autonomous driving, make robots cheaper, and reduce content production costs.
In the long term, it is a crucial step for AI to move from "inside the screen" to "in the real world".
After taking this step, the scope of AI's influence will expand from the Internet industry to almost all real - economy sectors.
This article is from the WeChat official account "IT Juzi" (ID: itjuzi521), author: Judy, published by 36Kr with authorization.