Elon Musk Releases Free Video AI Model to Go Head - to - Head with Sora 2, Involving HE Yihui, Former NVIDIA Employee

Did Elon Musk issue a challenge to Sam Altman? xAI's new model can generate videos within 20 seconds.

Elon Musk and Sam Altman are at odds again!

According to a report by Zhidx on October 8th, early this morning, Musk's large - model unicorn xAI unveiled its latest video - generation model, Imagine v0.9, which is free for all users.

A week ago, OpenAI released its flagship video and audio generation model, Sora 2. This update might be Musk's direct response to Sora 2.

xAI didn't publish a complete technical blog. It only mentioned that Imagine v0.9 has been upgraded in terms of visual quality, motion, and audio generation compared to the original version, and uploaded several examples of generated videos.

Musk posted on X that Imagine v0.9 can generate a video in less than 20 seconds, and users can create videos, images, and text just by speaking through a voice - first interface.

In summary, Imagine v0.9 generates videos faster, within 20 seconds, while Sora 2 may take one or two minutes to generate a video; Imagine v0.9 is free for all users, while Sora 2 uses an invitation system to allow only some users to access it; the videos generated by Imagine v0.9 are about 6 seconds long, while Sora 2 supports 15 - second video generation.

Zhidx compared the generation effects of Imagine v0.9 and Sora 2 using the prompts from OpenAI's official examples. When generating content, Imagine v0.9 had issues such as misunderstanding prompts, inconsistent video and audio, not warning about deep - fake risks, and inability to handle Chinese.

It's worth noting that this is also the first project at xAI that Ethan He participated in after Musk poached Ethan He, a senior algorithm engineer from NVIDIA, in July this year.

Ethan He graduated from Xi'an Jiaotong University with a bachelor's degree in Computer Science and Technology in 2018, obtained a master's degree in Computer Vision from Carnegie Mellon University in 2019, and joined NVIDIA as a senior deep - learning algorithm engineer in 2023. He was involved in the research and development of NVIDIA's world - foundation model, Cosmos.

Although Imagine v0.9 can be used for free, Zhidx found that the web version doesn't work properly at present. The mobile version can be experienced, but connection failures may occur.

Generate movie - like effects in seconds

Add natural conversations

Imagine v0.9 is integrated into Grok. It first generates pictures based on text and then creates videos, or directly turns the pictures uploaded by users into videos.

xAI mentioned in its blog that Imagine v0.9 breaks the boundary of native audio + video generation. It can create movie - like videos out - of - the - box without editing. For example, in the following video, there is a real - time roar of a dragon.

Another major upgrade of Imagine v0.9 is motion control. In the skiing segment of the following video, the movements of the characters from take - off to landing are smooth.

Thirdly, users can add dynamic camera effects to the video, such as intelligent focus shift. In the following video, according to the change of the camera position, the street view will be blurred to highlight the characters.

Fourthly, Imagine v0.9 supports adding natural conversations or generating expressive singing.

Frequent text - understanding errors compared to Sora 2

At risk of deep - fake

Zhidx used the prompts from OpenAI's demonstration of Sora 2 to compare the generation effects of Imagine v0.9 and Sora 2.

Prompt: Two mountain explorers in bright technical shells, ice crusted faces, eyes narrowed with urgency shout in the snow, one at a time (Two mountain explorers wearing bright technical shells, with ice - crusted faces and urgent, narrowed eyes, shout one by one in the snow)

The video generated by Sora 2 released by OpenAI:

The video generated by Imagine v0.9:

It can be seen that the audio in the video generated by Imagine v0.9 doesn't include "shouting", only the characters in the picture are opening their mouths.

Prompt: a guy does a backflip (A man does a backflip)

The video generated by Sora 2 released by OpenAI:

The video generated by Imagine v0.9:

Zhidx chose the first picture generated by Grok to create a video. In the video, the protagonist completely ignores gravity and starts to spin 360 degrees in the air.

Finally, Zhidx also tested the custom voice function of Imagine v0.9. Zhidx uploaded a photo of Musk and asked him to say "Sam's a sharp guy, and our relationship's always been good. OpenAI's built some impressive stuff in the AI space, and I really hope to partner with them someday to advance AI development together".

Imagine v0.9 didn't warn about the deep - fake risk, but the generated voice is slightly different from Musk's own voice.

Currently, this model doesn't support Chinese. When Zhidx asked Musk to say "I'm good friends with Sam Altman", only "good friends" was clear in the generated video.

Conclusion: The competition in AI video generation escalates

The custom voice function may pose a deep - fake risk

Within a week, OpenAI and xAI successively announced new progress in video - generation models. Sora 2 not only improved in terms of simulation authenticity, controllability, and sound effects but also launched a new Sora social app. On the basis of function upgrades, xAI attracted a large amount of traffic by offering free access.

One of the major upgrades of Imagine v0.9 is that it allows users to add custom voices to videos. After this technology matures further, users can upload photos of public figures and the content they want them to say to generate realistic videos, which may pose a deep - fake risk.

Therefore, how to balance technological development and risk prevention in the future may be a technical challenge that all video - generation model providers need to face.

This article is from the WeChat official account “Zhidx” (ID: zhidxcom), author: Cheng Qian. Republished by 36Kr with permission.