From Kling to Gemini, AI-generated videos are collectively bidding farewell to the "draw-card mode": Are director models about to go viral?
The era of "card - drawing" is coming to an end.
Over the past year or so, our perception of AI - generated videos can be summed up in two words: card - drawing. You input a prompt, click generate, and stare at the progress bar as the model spits out a few seconds of video. If it looks good, you keep it; if not, you change the words and try again. It can indeed produce amazing clips, but what it gives to creators is never a piece of material they can continue to work with. Instead, it's like a card that you either keep if you're lucky or redraw if not.
The most frustrating part of the card - drawing approach is not that the images aren't realistic enough, but that it's uncontrollable. You want a nine - out - of - ten finished video, but the model gives you ten fragments, each about seven or eight out of ten, but they don't fit together. You can't tell it, "Keep this shot and just change the character's movements." All you can do is roll the dice again and hope for a better result next time.
However, this way of doing things has recently started to change. In the past one or two months, several new video models have emerged almost simultaneously. They have different product forms, technical routes, and target markets, but the signals they send are surprisingly consistent. The focus of competition is no longer on who can generate a better - looking video in one go, but on who can produce something that can be continuously modified, controlled, and reused. In other words, AI video is evolving from a video - generating machine to a set of production tools.
(Image source: Google)
This raises a question. As AI video has reached this stage, will the core competitiveness of creators shift from video editing to something more like directorial skills? After all, we no longer need to "gamble" on the content generated by the video. So, will better expression and shot design become the focus of future AI video creation?
A video model that can't be edited is not a good AI
Recently, Google and Runway have been the most talked - about in terms of "editable" AI videos.
Runway has introduced Aleph 2.0, which focuses on making modifications based on the context of the original video. In simple terms, it no longer treats each generation as a blank slate. Instead, it recognizes what's in the material you have and can make local changes while understanding the original video, rather than starting from scratch every time. Google, on the other hand, has Gemini Omni, which takes a different approach. It emphasizes conversational continuous editing. You can make requests step by step, just like having a chat with someone, and let the model make changes based on the previous version, rather than starting over every time you have a new requirement.
(Image source: Runway)
For example, we asked Gemini to generate a video of a white ceramic cup on a wooden table with a slow - moving camera. There should be a notebook and a black pen beside the cup, with natural daylight and a real - phone - shooting feel, and an advertising - like quality with an ordinary studio background. In the first round, the result generated by Gemini was already quite satisfactory.
(Image source: Lei Technology)
Gemini generated a static - shot video of a white ceramic cup, a notebook, and a black pen on a wooden table. The main elements in the frame were clear, including the white ceramic cup, notebook, pen, and wooden table. The camera slowly zoomed in from a medium - long shot to a close - up, which met our requirements. However, it didn't look like an advertising video.
(Image source: Lei Technology)
So, we directly asked Gemini to make the video more like a coffee brand advertisement based on this material. For example, we asked it to add subtle steam to the coffee in the cup and soft highlights to the cup wall.
(Image source: Lei Technology)
It's not hard to see that the cup, pen, notebook, and even the background scene remained unchanged. What changed? It was the appearance time of the coffee, the camera - movement technique, and the effect of the steam.
This is exactly the intermediate state of AI video evolving from generation to editing. In the past, you wrote a prompt and waited for the model to produce a video. Now, you first generate a basic piece of material and then tell the model what's not good enough. Creators are starting to give modification directions like a director, but the model can't follow instructions as precisely as video - editing software. It's no longer just about card - drawing, but it hasn't fully evolved into a real post - production tool either.
The conversational modification method of Gemini is just one approach. In China, Keling and Seedance 2.0 are taking the concept of "editable" to a more systematic level, but they are approaching it from different angles.
Keling O1 aims to integrate the entire workflow into one engine. Generation, modification, reference, style redrawing, and shot extension, which were either impossible or required switching between multiple tools in the past, can now be done from start to finish in one place. This approach is smart because it doesn't position itself as a generator with a single strong function but rather as a creative platform. For creators, the most annoying part has never been the difficulty of a single step but having to move a video between seven or eight tools, constantly importing and exporting. Keling is trying to solve this inefficiency in the workflow.
(Image source: Keling)
Seedance 2.0 focuses on multi - modality. It allows text, images, videos, and audio to be used as references for enhanced reference - based generation, video extension, and audio - video synchronization. In the past, when we talked about video models, we only focused on how good the visuals were. However, a video is not just moving images; it's a combination of images, movements, sounds, and rhythms. By bringing sounds and movements under control, Seedance is reminding us that a video model should not only be able to create images but also understand rhythms and know where to make cuts.
(Image source: Seedance 2.0)
More straightforwardly, from the perspective of the entire video - model development, the card - drawing era has completely ended, and the "editable era" has begun. That is to say, the model that can streamline the entire process, provide users with the most intuitive optimization prompts, and offer secondary - editing solutions will continue to dominate the market.
AI video is no longer a game of chance, and the tasks for humans have changed
Let's circle back to the question at the beginning. As AI - generated videos are no longer a matter of chance, will the role of humans in the entire workflow change? My answer is yes.
In the past, an excellent video creator relied on skills such as video editing, color grading, transitions, and music selection, painstakingly crafting their style frame by frame. These skills won't become obsolete, but when the model can understand instructions like "Keep this camera movement and make the video more like an advertisement," what really sets creators apart starts to be a different set of abilities: the ability to describe shots, control rhythms, and judge which parts to keep and which to redo. In short, it's the ability of a "director - model."
AI video won't immediately replace video editing, nor will it turn creators into mere prompt - writers. These two extreme views are oversimplifications. More accurately, the focus of video production is shifting from "material processing" to "intention scheduling." In the past, you manually pieced together materials to create a finished video. In the future, you'll mostly be telling the model what you want, what you don't want, and what's lacking in the current version.
(Image source: Lei Technology)
This scheduling ability has a certain threshold. Those who can translate their vague creative ideas into camera language that the model can understand and quickly judge whether the result generated by the model is usable and what's missing will be more like the "model directors" of the future. A director may not operate the camera or edit every shot, but they know what the whole video needs and which direction to take at every decision - point. After AI video matures, this is what creators will need to do.
The tools have changed, and so have the requirements. However, the core of creation remains the same: a clear vision of the finished video in your mind and the willingness to adjust the model repeatedly until it meets your expectations. The card - drawing era is coming to an end. There will be fewer "gamblers," and what's truly scarce is the person who knows what they want and has the ability to make the model deliver it.
AI won't replace workers but will push them forward
Whenever a new tool automates a craft, some people worry about losing their jobs. However, looking back, tool upgrades have never really eliminated workers; they've only taken over the most mechanical parts of their work.
A classic example is the spreadsheet. Before the emergence of VisiCalc and later Excel, accountants and financial professionals spent a large part of their day using calculators to calculate and record data cell by cell. Spreadsheet software took over these repetitive calculations, but instead of causing accountants to lose their jobs, it transformed them from "number - crunchers" into "model - builders, trend - watchers, and decision - making consultants." The most boring tasks were removed, allowing them to focus on more valuable aspects of the job.
Before the popularization of non - linear editing software, video editing literally involved using a blade to cut film and rewinding tapes frame by frame, which is why we have the term "cutting videos." However, after the emergence of software like Premiere and Final Cut, the physical act of "cutting" disappeared, but video editors didn't. They shifted their focus from physical labor to higher - level judgments such as rhythm, storytelling, and emotion. The tools replaced the manual work, leaving the decision - making to the human mind.
(Image source: Seedance 2.0)
After the emergence of AI programming assistants, programmers were initially worried that they would no longer be needed to write code. However, the actual change is that the time they spend writing boilerplate code has been reduced, and they now focus more on reviewing the code written by the model, clarifying the architecture and boundaries, and deciding which parts to trust and which to rewrite. While the ability to write code is still important, the more scarce ability is knowing what to ask the model to write. Nowadays, the popular Vibe Coding has, to some extent, lowered the "entry" threshold, but it's often difficult for the works produced by Vibe Coding to meet the requirements for full - fledged development and delivery.
When it comes to AI video, in the next stage, the competition will no longer be about who can create more realistic images but about who can offer more stability, controllability, and editability. Creators won't just be limited to writing prompts; instead, they'll be more like model directors, knowing what to keep, what to change, what references to use to guide the model, and how to make it continuously improve until the result is usable. The art of video editing won't disappear, but the most valuable ability of creators is shifting from "how proficient they are with software" to "how accurately they can schedule the model."
Tools are constantly evolving, and workers need to strive to stay in positions that AI tools can't replace. The card - drawing era is coming to an end. There will be fewer "gamblers," and what's truly scarce is always the person who knows what they want and has the ability to make the model deliver it.
This article is from the WeChat official account "Lei Technology AGI", written by Lei Technology and published by 36Kr with authorization.