Image 2 × Seedance 2.0 King's Combo: 4 Ways to Go Viral on Foreign Platforms

Creativity is the only threshold.

When the "strongest real-life photos on the surface" meet the "strongest video generation," this powerful combination has once again ignited the creative passion of netizens.

After the popularity of GPT-Image-2, a large number of advanced video gameplay methods began to emerge. Combining with Seedance 2.0, a series of popular AI videos were produced.

A live broadcast shot of a sports event has been viewed tens of millions of times on X.

Video screenshot source: X@Ciri_ai

When the live sports broadcast camera focuses on the audience seats, the girl in the video is holding a beer and a hamburger. After making eye contact with the camera, she puts down the food in her hand and walks towards the football field. A player passes the ball to her. She gives it a hard kick, then confidently looks back and covers the camera with her hand.

Except for not knowing whether the last kick went into the audience seats or the goal, the whole process before that was as smooth as flowing water, showing no trace of AI generation.

Some netizens also made a version of Doubao watching a ball game, saying, "It turns out Doubao looks so good."

Video screenshot source: X@CryptoJHK

In addition to the audience videos of sports events, new gameplay methods such as "first generate a storyboard with GPT Image 2 and then convert it into a story video," "generate a real-game screen recording from a single game screenshot," and "combine 3D conversion" have also become popular during this period.

We have sorted out these interesting cases and collected the corresponding production guides. It's time to take another look at the current AI video workflow.

Gameplay 1: Experience the thrill of playing in the World Cup

The conventional method is to find the corresponding image generation prompts, and first get the starting frame image in GPT Image 2. Based on the current stability of GPT Image 2, using the same set of prompts to generate images multiple times, the final results won't differ much.

For example, for a set of available prompt examples, we can even modify some of the content, such as "a hyper-realistic still from a CCTV 5 sports event live broadcast," "the person in the uploaded picture is sitting in a packed football stadium"...

A hyper-realistic still from a sports event live broadcast. In the picture, a charming lady is sitting in a packed football stadium, watching a night game. She is wearing a dark brown sleeveless high - collar satin top and black square earrings. Her shoulder - length light brown/golden hair falls naturally and is slightly curly. She is holding half of a leftover cheese burger in one hand and casually drinking from a blue aluminum can drink in the other. Surrounding her are fans wearing bright yellow and blue jerseys and scarves, creating a sharp contrast in team uniform colors. The picture is natural and smooth, with a strong cinematic feel, as if capturing the exciting moment of the game from the perspective of a live TV camera with a shallow depth of field. The picture should include realistic stadium seats, a crowded audience atmosphere, live - broadcast overlay information showing the real - time score and game timer in the top - left corner, and a sports channel watermark in the top - right corner. The natural stadium lighting, delicate skin texture, clear focus on the lady, and slightly blurred background crowd create the beauty of a real - life sports live broadcast, with a 16:9 composition.

Or use more precise prompts to control various elements of the picture, such as determining the corresponding score situation, the event and teams in the game.

This is a screenshot from a live World Cup football game on CCTV 5. The camera switches to the audience seats - the person in our reference image is sitting with a smile on his face. His smile is natural, as if he is not aware that he is being filmed. He is sitting in a prime position/front row behind the sideline of the stands, surrounded by a bustling crowd of spectators. Locked conditions: Do not change his facial structure and keep his portrait. Complete CCTV 5 sports broadcast overlay: In the top - left corner is a scoreboard with the team emblems, game timer, score, and event logo; in the corner is the CCTV 5 sports network watermark; in the lower third is a graphic bar; the picture ratio is 16:9. The image looks exactly like a real TV screenshot - broadcast - grade color correction, slight compression marks, interlaced scan graininess, and the rich green glow of the stadium under the lights shining on the stands. This is the second leg of the FA Cup semi - final between Arsenal and Tottenham Hotspur, held at the Emirates Stadium. The score shows Arsenal 2 - 1 Tottenham, and the game is in the 67th minute. Arsenal leads 3 - 1 on aggregate. The game starts in the evening, the stadium is packed, and the lights are brilliant.

After getting the picture, then find a public Seedance 2.0 prompt. Here we found a video prompt for a basketball game and directly replicated it from the World Cup to the NBA.

A hyper - realistic live broadcast picture of an NBA playoff night game, with a realistic sports live - broadcast camera, shallow depth of field, natural stadium lighting, compressed TV picture quality, slight motion blur, automatic focus breathing effect, handheld shooting defects, realistic audience movement, and the real - life feeling of a live broadcast, with a 16:9 composition.

The lady in the picture is casually drinking beer and eating a hamburger in her hand while watching the game.

The live - broadcast camera captures her, and like a real NBA photographer shooting beautiful fans in the audience seats, it slowly zooms in. This composition feels casual and real, rather than deliberately pursuing a cinematic effect. The fans behind her are wearing Lakers jerseys. One of them briefly looks at the camera, and another fan is using a mobile phone to film the game.

She calmly puts the beer and hamburger on the seat beside her, stands up naturally, and walks towards the court in high heels. She neatly takes the ball from a player on the court. Her natural body movements are tracked by a real - life sports camera.

She easily dribbles the ball to near the center court and then effortlessly shoots the ball with a perfect posture.

Under the realistic sports event broadcast camera, the ball flies through the air. The stadium is instantly silent for a second.

Swoosh! A perfect and clean shot.

The whole stadium erupts in excitement. The players on the bench scream and jump up. The mascot goes crazy. The reaction of the audience makes the camera shake. The commentators are completely overwhelmed.

The woman hardly reacts. She gives a slight smile at the camera and then walks back to the sideline, while the crowd behind her goes wild.

Just before she sits down, she looks directly at the TV live - broadcast camera with a playful smile and then gently covers the camera with her hand for a second, as if she knows she has just created a viral moment.

The camera switches to the chaotic ESPN replay and the screaming crowd.

Prompt source: https://x.com/bydanielxyz/status/2054302615463460945

The final video effect is quite realistic. Coupled with the final replay shot, it is almost the same as her position in the center court before, with no obvious flaws.

Another method that everyone can try without finding prompts is to directly upload the video to Gemini and ask it to analyze it.

Please follow the system instructions.

System prompt: Ultra - detailed video analysis

Role: You are an experienced film photographer, visual analyst, and sports mechanics describer. Your job is to decompose the video clip into extremely detailed, frame - by - frame text descriptions.

Goal: Please transform the provided video/clip into a vivid and dynamic text analysis. You must accurately capture the physical mechanics, rhythm, micro - expressions, momentum physical laws, and the physical characteristics of the camera itself in the video, and fully transcribe all audio and dialogue.

Strict rules: Complete audio and dialogue transcription: You must transcribe all audio prompts. Please accurately write what the characters say in quotes (e.g., "Look at this!"). If the voice is unclear or overlapping, please indicate it. In addition to dialogue, you must also meticulously describe all sound effects (metal clashing, whistling, impact sounds), human voices (panting, laughing, screaming), background noise, and music.

Prohibition of using intellectual property names: Do not use character names, actor names, or series names. Please only describe them based on their appearance, clothing, and body type (e.g., "a burly man," "a woman in a pink kimono").

Regard the camera as a character: You must describe the camera's operation like a physical object. Pay attention to the slight jitter, perspective distortion, sudden autofocus adjustment, lens flare, motion blur, rapid panning, and the photographer's physiological reactions (e.g., "When the photographer flinches, the camera suddenly shakes downward").

Kinetic physics: Describe the transfer of weight, gravity, tension, and impact. Mention phenomena such as clothing flapping on the legs, muscle contraction, reaction force of a strike, or environmental breakage.

Format template: You must divide the video into several parts in chronological order, using bold timestamp titles and theme titles. Under each title, use bullet points to categorize the content. [Timestamp] - [Timestamp]: [Stage title]

Visual composition: [Describe the shot type, lighting, style (e.g., vertical smartphone shooting, 2D animation, close - up shot, strong fluorescent light).]

Subject: [Describe the exact position, posture, clothing, and micro - expressions of the person.]

Action analysis: [Decompose the body movements frame by frame. Micro - movements, momentum, physical principles.]

Lens dynamics: [Please describe in detail the lens movements, zooming, blurring, shaking, and panning effects.]

Audio/rhythm: [Please transcribe all spoken dialogue in quotes. Describe the rhythm/tension at that time and record in detail all audio clues, such as panting, footsteps, environmental impacts, music, or background noise.]

Example output user input: [A video of a man trying to flip a pancake, but he uses too much force, and the pancake hits the ceiling and then falls on his face, causing the photographer's phone to drop.]

AI response: 0:00 - 0:02: Preparation and opening shot composition: The video is shot vertically on a smartphone. The lighting is the kitchen ceiling light, which is strong and warm. The picture shakes slightly continuously, indicating that the photographer is an amateur holding the phone with one hand. The subject: A man in a loose gray hoodie stands in the center of the picture, holding a black Teflon frying pan. There is a perfect round golden pancake in the pan. Action: The man grins and looks directly at the camera with an inexplicable confidence. He rhythmically circles his wrist, rotating the pancake to ensure it loosens in the pan. He slightly bends his knees to lower his center of gravity for better force. Camera dynamics: The photographer stands about four feet away, taking a fixed medium - shot from the man's waist to above his head. Audio/rhythm: The rhythm is slow and full of anticipation. The pancake makes a rhythmic scraping sound as it slides on the metal plate, like "hiss hiss hiss." The man's voice is clear and confident: "Okay, a perfect flip, three... two... one..."

0:02 - 0:04: Catastrophic launch Visual composition: The camera remains stationary, but the focus briefly drifts when the subject's arm moves quickly. Theme: The man's confident smile gradually turns into a pained expression. Action: He quickly sinks his right shoulder and then suddenly swings his arm upward with excessive amplitude and force. The pancake instantly flies out of the pan and shoots straight into the air at high speed, completely going beyond the top boundary of the picture. Camera dynamics: The camera suddenly tilts upward, trying to track the flying pancake, but the movement is delayed and shaky. Audio/rhythm: The man utters a short and strained grunt: "Humph!" The scraping sound is immediately replaced by a loud and wet "splat" sound from above off - screen, indicating that the pancake has hit the ceiling.

0:04 - 0:06: Impact and camera chaos Visual composition: The shadow falling from above suddenly obscures the light. Event: In an instant, the thick, half - cooked pancake falls straight down and lands with a "splat" on the man's face, completely covering his eyes and nose. He instantly leans back, shrugs his shoulders to his ears, and the frying pan in his hand also drops to the ground. Lens dynamics: The photographer's instinctive reaction is triggered. As the photographer twitches, the camera suddenly shakes downward and to the left. The picture completely blurs into a chaotic, dragging motion - blurred image of the kitchen floor and cabinets. Sound effects/rhythm: The heavy metal clashing sound of the frying pan hitting the linoleum floor dominates the sound. The photographer gasps and shouts, "Oh my god, dude!" followed by the clear sound of the phone dropping. The video ends abruptly, with the picture frozen on the tilted and blurred baseboard.

Prompt source: https://pastebin.com/H8DeXq1G

We upload the opening video to Gemini, and Gemini will output a complete prompt according to the example.

Use the prompt from Gemini, make slight modifications to the content in the prompt, and copy it to Seedance. Whether using the all - around reference or the first and last frames, good results can be obtained.

It's worth noting that the longest prompt that Seedance 2.0 can use cannot exceed 2000 words, and the video analysis content extracted by Gemini is often quite long. We can manually delete the unnecessary parts of the original video.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Image 2 × Seedance 2.0 King's Combo: 4 Ways to Go Viral on Foreign Platforms, All the Prompts Are Here

Gameplay 1: Experience the thrill of playing in the World Cup