ByteDance Seedance 2.0 Actual Test: Disordered Voice, Garbled Subtitles

Actual test of ByteDance's new ace move.

Before we start the article, let's take a look at an animated GIF first.

(Image source: Bilibili)

It's a really cool movie scene, right? The materials and atmosphere are both on point.

But what if I told you that this movie was entirely AI-generated? I think many readers would be surprised, or might even try to go back and find any flaws in the clips.

In recent years, with the rapid development of technology, it has become increasingly difficult to distinguish between special effects and AI. Creating our favorite videos has seemingly never been this easy.

But I guess most people, like me, just watch and don't practice, or they've tried but have given up before really getting started.

The reason can be summed up in one sentence: This stuff is really discouraging.

If you want a higher level of completeness, you have to deploy your own model and set up a stable and controllable workflow on ComfyUI. But those dense parameters are still a mystery even to me, an old hand in the AIGC field for many years. It's safe to say that most ordinary people probably won't be able to figure them out.

If you just want to have some fun, you can try Sora and Veo. However, these websites are not only expensive, but the results are also like a lottery draw. You have to pay for each attempt, and it's also difficult for people in China to use them.

Who would have thought that after people had been struggling for so long, ByteDance in China quietly prepared a big surprise.

(Image source: Jimeng)

Just this week, ByteDance's video model, Seedance 2.0, suddenly went live. There was no long waiting list for application, and no hidden internal testing invitations. It was simply released to the public during the Spring Festival, the biggest traffic window of the year.

After using it, I can only say that for friends who want to create their own AI videos, good days are coming.

15 seconds of video generation, one hour of queuing

Let's first talk about how to use it.

Seedance 2.0 is now available on the Jimeng platform. Currently, members (at least 69 yuan) can directly use the latest model, which can be accessed via both the computer web version and the mobile app. It is expected to be fully available to all users in a few days.

If you don't want to pay, you can also use ByteDance's Xiaoyunque. Currently, new users who log in will be given three free opportunities to generate videos using Seedance 2.0, and 120 points will be awarded every day.

After using up the free opportunities, generating a video with Seedance 2.0 costs 8 points per second. That means you can generate up to 15 seconds of video content for free every day, which is enough for a taste test.

(Image source: Lei Technology)

Now let's look at its capabilities.

As we all know, most video models in China used to only be able to generate silent videos. Even ByteDance only added voice - over in the Seedance 1.5 version at the end of last year.

Now, the sound and pictures in Seedance 2.0 are perfectly coordinated.

This new model can generate matching sound effects and background music while creating videos, and it supports lip - syncing and emotion matching. It ensures that when the characters are speaking, their mouth shapes are correct, and their expressions and tones also match.

To test its ability, I entered a simple prompt: From a first - person perspective, sitting by the window of an old - fashioned green train, watching the fields flashing by outside the window, and the glass on the table vibrating slightly.

Maybe because so many people wanted to experience it, I actually had to wait in line for more than an hour before the video was generated.

To be honest, the level of detail in the picture didn't surprise me. What really gave me goosebumps was the sound. In the video, there was not only soft background music but also the unique low - frequency rhythm of the train rolling over the tracks. Even when the camera panned over the glass on the table, the ripples in the water caused by the vibration were clearly visible.

Looking at the fields outside the window and the setting sun, it's really hard to imagine that none of this actually existed.

This "original sound" experience is truly different from the later - added voice - over. It shows that AI is not just creating pictures; it understands what's happening in the picture and knows what sounds should be made in that environment.

This is quite interesting.

But that's not enough. Having good sound is not enough; the video also needs to be stable.

Previously, when using AI to make videos, the most feared thing was the "plastic surgery" of characters. One second the protagonist was a tough - looking Westerner, and the next second he became a young Japanese - style heartthrob. This problem was especially obvious in scenes with large - scale movements.

To test the consistency of Seedance 2.0, I deliberately increased the difficulty and generated a video of "a rainy - night alley fight between two martial artists in the puddles".

As for the theme of the video, let's call it Goat VS Goat.

The result was quite surprising. In the more than ten - second fight scene, the facial features of the two characters remained consistent. Even when they were flying kicks and changing positions, the texture of their clothes and the contours of their facial features didn't distort.

Although there was still a slight smudging effect in some extremely blurred motion frames, compared to the previous - generation model where the characters' faces changed every three seconds, this is a qualitative improvement.

It can be said that in terms of basic qualities, Seedance 2.0 is already a highly usable tool.

From text to finished video, one person can handle it, but voice - text inconsistency and picture glitches still exist

After the basic tests were successful, it's time to increase the difficulty.

After all, for most friends who want to do self - media, we not only hope that AI can create realistic pictures but also understand our creative ideas.

For this reason, Seedance 2.0 has introduced a concept called self - storyboarding and self - camera movement.

Simply put, it can automatically plan the storyboard and camera movement based on your description. You just need to tell it your requirements, and it will decide how to shoot.

Xiaolei tried to enter a very simple instruction: A person wearing sports shoes is running hard on the soft beach as the sun sets.

The difficulty of this description lies not only in storyboarding but also in the understanding of the physical world.

Since sand is a fluid, when you step on it, it will sink, and when you lift your foot, it will carry sand particles. These are details that were difficult to reproduce in previous video - generation technologies.

In the generated video, I could clearly see the indentation of the foot in the sand. Every time the person pushed off the ground, sand particles flew backward, and the parabola of the flying sand was very natural. There was no anti - gravity phenomenon where the sand floated in the air. Even the swinging of the calf muscles with the running rhythm had an obvious tremor.

To be honest, when I saw this result, a thought flashed through my mind: This effect can be directly used in short videos.

Based on this effect, Can I directly use the workflow to create a 60 - second Brain Rot short video?

So, I first found another AI assistant of ByteDance, Doubao. I asked it to generate a rough nine - grid video storyboard according to my requirements, and then create a very standard Brain Rot short video script of the "choose the red door or the blue door" theme.

(Image source: Lei Technology)

I have to complain that Doubao still doesn't have a good understanding of storyboard pictures, which took me a lot of time.

Then, I fed the storyboard and script to Seedance 2.0.

Although Seedance 2.0 currently only supports a maximum video length of 15 seconds, through multi - modal input, we can use the end of the previous video as a material for the requirements of the next video to complete the connection of multiple shots and maintain the consistency of characters, and then perform manual editing and splicing.

It took me half a day to complete this entire process.

Well, although the Chinese - generation level of Seedance 2.0 far exceeds that of foreign competitors, in the actual generated content, there are still cases where the subtitles don't match the voice, and text glitches in the picture are objectively present and almost unavoidable.

Due to the current 15 - second limit, if I prepare more text content, the synthesized voice will read the whole text at a very unnatural high - speed.

Moreover, the video I generated this time was relatively long. You can clearly notice that Seedance 2.0 always handles the action of opening the door in a strange way. Even after I used up all my free credits, I still couldn't get a better result, so I had to give up.

As for the "lottery - draw" problem... at least for current video - generation applications, it is inevitable.

Conclusion

In my opinion, the emergence of Seedance 2.0 is like a shot in the arm for domestic creators.

It's undeniable that from a pure technical - indicator or content - output perspective, Sora may still be the industry benchmark in terms of the coherence of long - take shots and the artistic sense of the pictures.

(Image source: Sora)

But in the tech circle, there is a very simple truth: A good technology must first be a usable technology.

For now, Seedance 2.0 has almost no usage threshold. Everyone can easily register and use it, and its price is even quite cost - effective compared to similar competitors.

Tim, a well - known self - media blogger of "Film and TV Hurricane", also highly praised the generation results of the Seedance 2.0 model today. He believes that the level of detail in the generated videos, the movement of the camera, the continuity of the storyboard, and the matching degree of audio and video are all excellent, and he calls it "an AI that will change the video industry".

In a sense, the views of professionals in the imaging industry are much more important than the evaluations of self - media and the scores on the large - model rankings.

I bet that in the next six months, you will see a large number of short dramas, mystery commentaries, and even product - promotion videos generated by Seedance 2.0 on Douyin and Video Accounts. Content that doesn't require complex acting skills but focuses on visual spectacles or plot twists will be the first areas to be completely transformed by AI.

Can you believe that I, who have no experience in art, animation, or even video production,

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

ByteDance Seedance 2.0 actual test: Voice is disordered, subtitles are garbled. AI video is still a game of probability.

15 seconds of video generation, one hour of queuing

From text to finished video, one person can handle it, but voice - text inconsistency and picture glitches still exist

Conclusion