This model will revolutionize the way humans acquire information.
As a Ph.D. student in computer science and also the product manager of my own tech company, I've found that I've grown fond of reading technical papers.
Because papers not only contain technology, but more importantly, they present leading algorithms or system frameworks that can be quickly integrated into current systems. This enables us to solve user problems, achieve higher efficiency, and provide a better user experience, thus closing the commercialization loop.
For example, last week, ByteDance updated its latest model, Vidi2. Simply put, the model's ability is to quickly interpret videos. That is, without human intervention, it can interpret each frame of a video and obtain corresponding result data.
This model is VIDI2
As a product manager, I always keep an eye on revolutionary technologies. Especially during my Ph.D. studies, I hoped that these research solutions would become the technological barriers for engineering products.
The technology is almost revolutionary: it has changed the way people access information
If converting WeChat official account articles into image messages or generating videos has become the mainstream form of content creation, then the ability to reverse - convert videos into text greatly improves the efficiency of content information flow generation and doubles human information retrieval capabilities.
Previously, we used to focus on where a person went. Now, the ability to access and retrieve information will determine everyone's worldview.
This model is almost revolutionary for new media creators and self - media practitioners in terms of information generation.
Like the way I access information now, videos are the mainstream. In an era where short and long videos have become the dominant form of information, fewer people read text. As humans, we naturally prefer faster and more high - frequency consumption models, that is, the "lazy mode" of consumption.
Supports video keyword search
In Vidi2, it can serve as a translator for many new media tools, and can even match teaching videos or be used for robot learning. By outputting the story and steps of a video in text form and then having the large model compare and remember the corresponding actions in the video, the model can converge more quickly.
For example, in the official video mentioned above, I can list all the frames and scenes with dragons by searching. If I input a video of a hand, it can also output videos containing hands.
User - acceptable efficiency: from text search to video search
With the underlying technology of Vidi2, we can now perform video searches instead of relying on titles. All clickbait video titles will become meaningless. Videos with eye - catching covers but irrelevant content will no longer work.
Everything should revolve around video content, and the text within the video can be used for explanation. Imagine the vast amount of content on the Internet. To truly search for what we need, we used to have to spend a lot of time, especially when checking surveillance videos. Now, with this technology, we can search through surveillance videos at will, saving time and quickly locating the required videos.
Supports editing video elements
In the Vidi2 model, it not only supports search but also video editing. Users can replace the searched objects, transforming the video into a different scene.
This is like in the science - fiction movie "Bloodshot" starring Vin Diesel. The tech company in the movie uses video editing technology to modify the objects, characters, and even dialogues in the memory - related space videos, thereby tampering with the protagonist's memory and turning him into a killing machine.
The above is the memory - editing scene from the movie. Memory is similar to spatial intelligence. Although VIDI2 currently only supports planar videos, not spatial videos, it has already doubled the current information access efficiency. The current retrieval speed is almost at a usable level, far exceeding the experience of watching a short video, let alone watching an entire long video.
The above is the new technology of Vidi2. I hope product managers will pay attention.
That's all for today's sharing.
This article is from the WeChat official account "Kevin's Little Efforts to Change the World" (ID: Kevingbsjddd), written by "Kevin's Stories". It is published by 36Kr with authorization.