Investing $35 billion, South Korean chaebols join hands with Silicon Valley. Will an AI version of "Pixar" emerge out of nowhere?
If I ask you what comes to mind first when you think of AI-generated video movies, what would it be?
Some say it's Sora2.
Some say the film and television industry is doomed, and filmmakers will lose their jobs...
But if I tell you that these platforms don't really compete with the real content production in the film and television industry, would you believe it?
What truly threatens the industrial system of film and television production is not the short-video-level AIGC you see every day, but a quieter and more "director-like" AI production platform: it doesn't generate videos randomly. Instead, it creates a "shootable world".
Let's do a simple breakdown from the core elements of a movie to see what basic requirements there are for making a film:
✔ Character consistency
✔ Continuous plot
✔ Reshootable and modifiable
✔ Camera scheduling: Shot size, lighting, focal length
✔ Object reuse, scene reuse
✔ Complex action logic: Running, fighting, spatial occlusion
✔ Stable physical laws: Gravity, wind, consistency of light and shadow
Obviously, the current capabilities of AIGC are unable to complete a movie. Whether it's continuity, consistency, following the laws of the physical world, realizing complex logic, or even maintaining the consistent image of the protagonist... The same protagonist at the beginning and end of a video may look like brothers or even cousins.
Not to mention the strange details caused by the hallucinations that are hard to eradicate in AI systems.
So, film-grade AI platforms and ordinary consumer-level video-generating AIs are two completely different species.
What does an AI platform that could potentially develop into a real film-grade application look like?
Recently, Hollywood broke a big news. The AI startup Utopai Studios located in Los Angeles, USA, and the global technology company Stock Farm Road are seeking cooperation.
△ Hollywood media has noticed this cooperation that could potentially change the industry
The two parties plan to establish a joint venture, Utopai Studios East. By leveraging Utopai's AI technology and Stock Farm Road's infrastructure and cash reserves, they aim to build a real film-grade AI platform.
Just by simply analyzing the clear and implementable technical path of this new company and its infrastructure plan worth tens of billions of dollars, we can tell that this project is not just talk.
In 1995, Pixar released the first CG animated film Toy Story. The first company to produce a real AI movie has yet to emerge. Utopai Studios East might be a strong competitor.
△ Everyone is looking forward to the birth of an AI version of Pixar
Since the current global AIGC trend is not the "right path" for film creation, let's take a good look at what the right path for film and television AI should be like today.
01 Unveiling the Masterminds Behind Utopai East: Korean Chaebols + Silicon Valley Elites
The investors in Utopai East are Stock Farm Road (SFR) and the AI film and television production company Utopai Studios, each holding a 50% stake.
One of the investors, SFR, has very strong financial strength. Its founders include Brian Koo, the grandson of Koo In-Hwoi, the founder of the LG Group and the group's heir, and Amin Badr-El-Din, the founder, chairman, and CEO of the Middle Eastern Offsets Group.
As a major financial backer, SFR provides funds, creative expertise, and industry connections to the joint venture, making it clear that the joint venture "has deep pockets".
△ The founder of Stock Farm Road is Brian Koo, the grandson of Koo In-Hwoi, the founder of the LG Group
The other investor is the AI-native film and television studio Utopai Studios, which provides technology, work processes, and infrastructure for the joint venture.
Utopai Studios was formerly known as Cybever, an AI video production company founded in Los Angeles in 2022 by two former Google employees. Previously, the company's technical focus was on developing AI tools for building complex 3D environments, concentrating on 3D world generation (AI 3D world) technology.
△ The two co-founders of Utopai Studios, Cecilia Shen and Jie Yang
Later, the company found that there was a broad application prospect for industrial-grade video AI-assisted production in the media field. So, it changed its name to Utopai Studios and transformed into a film and television studio, specializing in the media hub of Hollywood.
As one of the first Hollywood film and television studios to start with AI technology and operate comprehensively, Utopai Studios is also the first AI company to join the Hollywood union. This also shows the company's ambition to use AI to transform and optimize the existing production system in Hollywood.
02 "Rehearsing" Everything in the Virtual World: AI Transforms the Entire Film and Television Production Process
How to optimize and transform?
Make comprehensive changes to the production process, model it as a whole, and move it to the AI-based platform.
Traditional film and television production is divided into pre-production, production, and post-production. It goes from script development → storyboard/art design → on-location or studio shooting (cameras, lighting, actors) → rough cut → visual effects (green screen, CG, compositing) → color correction/sound → distribution. Each stage has an independent team and is completed using human resources + physical equipment: cameras, lighting trucks, set design, on-location construction, post-production studios, etc.
The core content creation still follows the principle of "shoot first, then edit". Once the shooting result is not satisfactory, there are few remedies, and usually, reshooting is the only option.
The future direction of Utopai East is to build an AI-native content creation platform (AI-native production platform). Utopai East provides a complete AI technology stack. The entire process from creative understanding to content generation is modeled and processed on the AI platform.
△ As an AI film and television company, Utopai Studios focuses on building film and television basic models
The implementation of content creation technology based on native AI can be roughly divided into the following steps:
Script Understanding and World Modeling
- Use large models to break down the script into: characters, scenes, emotional curves, action requirements, and shot lists.
- Transform the "textual world" into a structured semantic 3D world description: what scenes there are, when, the weather, lighting atmosphere, etc.
- This step is like creating a virtual 3D world, designing a "conditional space" for the subsequent generative models. All subsequent images/content are generated based on this premise.
△ Utopai Studios has already tried to use AI to create the virtual 3D world needed for movies
Pre-visualization and Shot Planning
- The model generates a previz (preliminary animation/storyboard) based on the script input. The director can drag the camera and switch the shooting positions in the virtual world to determine the desired effect, instead of first taking the crew to the actual location for test shooting.
- Shot planning not only outputs images but also accompanies the output of camera parameter trajectories (camera path), focal length, movement mode, etc., for the subsequent generation of high-quality content. With the help of the AI platform, the work efficiency of the traditional "storyboard artist + visualization team" can be improved by 50% to 80%.
△ Future movies may be completely shot and adjusted in the virtual world
Material Generation and Scene Composition
- Based on models trained with a large amount of compliant 3D compositing data, high-quality materials such as scenes, props, extras, and special effect elements can be directly generated.
△ From a line of text to a complete 3D world, Utopai is exploring AI-driven 3D layout generation
- These materials are not simple 2D images but 3D data with volume/geometric information and can be reused from multiple angles in the virtual 3D world. These data can be flexibly adjusted and reused repeatedly according to needs. For example, the same "street scene" can be used for 20 scenes without actually building a large studio.
High-quality Final Product Generation
- Generate long videos with a unified style, character consistency, and timeline consistency
△ Utopai can achieve film-grade video-to-video conversion, ensuring that the micro-expressions and action details of the actors are retained with frame-level accuracy
Editing and Multi-versioning
- On the basis of AI generation, editors are still needed. However, what they edit is no longer film but the final product segments output by the model.
- Multi-versioning (dubbing/lip-syncing/copywriting rewriting) is automatically completed through a multi-lingual voice and video alignment model. Based on the production of the main version, AI can automatically expand it into multi-lingual versions and make detailed modifications.
Overall, Utopai East compresses the "film crew + studio + post-production studio" into a production pipeline of "creator front-end tools + cloud models", completely changing the traditional film shooting method.
What has also changed, in addition to the process, is the "digital materials" created during the film production process. We might as well call them "assets". These assets have a high degree of "reusability", which further reduces production costs for future film creation.
△ Utopai Studios emphasizes that it pursues AI film production, not AI videos
Actually, the film and television industry has done this before. For example, large Hollywood studios are physically "reusable assets". The most valuable thing about large studios is the complete set of film and television production chains. The best makeup artists, lighting designers, and set designers, along with ready-made sets for different eras and scenes, greatly reduce the cost of film and television creation.