"Cursor" for video creation, Anijam rewrites the animation creation process | Emerging new projects
Text by | Liang Jianqiang
Edited by | Wang Yuchan
One - sentence Introduction
Anijam is an AI Video Agent designed for the animation creation scenario, aiming to improve the production efficiency of animation content at a lower cost. Users only need to input natural - language prompts, and key processes such as character generation and storyboard design can be automatically completed.
Team Introduction
Wang Jue, the founder and CTO, has obtained a bachelor's and a master's degree from Tsinghua University and a doctorate from the University of Washington in the United States. From 2020 to 2023, he served as a Distinguished Scientist at Tencent and the Director of the Visual Computing Center of Tencent AI Lab. He holds the title of IEEE Fellow. From 2017 to 2019, he was the Dean of Megvii's US Research Institute, and from 2007 to 2017, he was the Chief Scientist at Adobe.
Fang Chen, the CEO, graduated with a doctorate from Dartmouth College, an Ivy League institution in the United States. He has worked at Adobe Research, ByteDance's North American AI Lab, and Tencent. He has comprehensive experience from technology R & D to product implementation, and his achievements have been widely applied in products such as Photoshop, Lightroom, Douyin, and WeChat.
Financing Progress
It has completed a total financing of tens of millions of US dollars. The investors include Miracle Plus, Atypical Ventures, and Yuanjing Capital.
Products and Business
The core positioning of Anijam is to create an AI Video Agent for video creators, similar to "Cursor" in the video field. Just as AI programming tools like Cursor can understand requirements and assist in completing development tasks, Anijam also hopes to promote video creation towards "AI - driven generation" and help creators complete key processes such as character generation and storyboard design.
Video creation is essentially a professional task with high thresholds and a long process. For a complete video, from the early - stage script conception, character setting, and storyboard breakdown, to the mid - stage shot design, action connection, and unified picture style, and then to the late - stage editing, dubbing, and rhythm adjustment, a systematic creative method and professional ability are required.
Although AI has made video generation easier, for most ordinary people, the real difficulty lies not in generation, but in how to build a narrative, design shot language, and assemble them into a well - expressed video.
Fang Chen believes that large AI models will be dominated by leading companies, and the opportunity lies in solving the problems of uncontrollable and difficult - to - modify generated content.
Based on this judgment, the product has re - disassembled the video creation process. Anijam integrates the originally fragmented and complex creative processes through AI, allowing users to complete the entire video production through simpler interactions.
Anijam integrates multiple third - party large models in the background and optimizes the Agent process arrangement, post - editing algorithms, and user experience.
Users only need to input a natural - language sentence, such as "Create a video about Monkey King fighting the White Bone Demon three times", and the system will automatically complete the entire process from story outline generation, visual concept design, to storyboard script breakdown, key - frame generation, video clip production, and final synthesis.
The whole process is carried out on a canvas. Users can intervene and make modifications at any stage. For example, adjust the painting style, add or delete character settings, or optimize shot details.
During this process, the system will automatically identify key elements in the story, including characters, scenes, props, and styles, and generate a complete storyboard shot based on this. Each shot will contain information such as scene descriptions, character states, and shot language.
Creators can not only see the general effect of each storyboard but also make modifications through natural language, such as adjusting the shot perspective, changing the composition, or replacing local elements. At the same time, the system will automatically generate key frames for each storyboard and support preview and modification shot by shot.
This is also one of Anijam's key capabilities, that is, to move from lottery - style video generation to "controllable editing".
Traditional AI video generation often relies on repeatedly generating the entire content. Once a single frame is not satisfactory, the whole process needs to be restarted. Anijam emphasizes "local editability". For example, only modify the character's expression without affecting the action or background.
Anijam is optimizing its AI - driven video editing ability - not only supporting local modifications but also trying to build an AI self - feedback mechanism. In Fang Chen's plan, Anijam will strive to let AI automatically evaluate the video generation effect in the future, acting as a "third - party AI director", scoring the quality of the shots generated by AI tools, and optimizing the generation process in reverse, reducing the number of manual adjustments through "AI guiding AI".
In terms of product form, Anijam provides both desktop and mobile versions. The web version offers more complex creative and editing capabilities, suitable for long - process production. The mobile version mainly features conversational interaction, with a lighter interface, suitable for high - frequency content creators. The team is also building a creator community, depositing high - quality content as templates for users to reuse.
Currently, the product supports video generation of up to 5 - 10 minutes. The specific time consumption depends on the complexity of the content. The generation of a about 2 - minute video may take tens of minutes to an hour.
In terms of business model, the product adopts a tiered subscription model to cover different levels of creative needs, ranging from 25 to 60 US dollars. The increase in the price range essentially corresponds to the difference in computing power quota and generation ability.
As users continue to interact with the Agent, the system will accumulate a large amount of data related to creation, including user preferences, style choices, and modification paths. These data will be further structured and transformed into "creative memories" and embedded in the Agent, enabling it to gradually develop personalized capabilities.
What Anijam tries to achieve is to start the creation with one sentence, complete the production with AI, make controllable modifications to each frame of content, and become more efficient with continuous use.
Founder's Thoughts
- The AI video track is still in its early stage, and the time window is the barrier
The continuous progress of large video models is a definite trend, which is a bonus for all players in the field. The real difference does not lie in the model itself, but in the ability building outside the model, such as post - editing ability, Agent process arrangement, and how to make the generated content usable and modifiable. Compared with simply invoking the model, how to make the generated content usable and modifiable is the core of product competition.
This track is still in its early stage, and the product is not perfect. It still requires a lot of user participation in modification. At this stage, the most worrying thing is not competition, but efficiency. It is necessary to enter the market as soon as possible, acquire users, and accumulate data and knowledge in real - world use.
The key for startups to compete with large companies is to start earlier, form user retention and data precipitation.
Referring to the development path of the image generation market, there will not be a monopoly situation in the AI video field in the future. Instead, multiple manufacturers will occupy different shares. The market structure may be more balanced, with each company occupying a part of users and scenarios, rather than a single platform occupying the vast majority of shares.
- The core of the video Agent is not just to generate good videos, but to "tell a good story"
The current bottleneck of AI video lies not only in the picture - generation ability but also in the narrative ability.
It is extremely difficult for the model to clearly tell a complete story within two minutes. This means that the real challenge is not to generate a single segment, but to organize the content with director language and tell the user's story.
Future creator tools will not only provide functions but also become the user's creative partner. As creators continue to interact with the Agent, the system will accumulate the user's behavioral data during the creation process, including preferences, modification logic, and experience. The Agent can gradually understand the user's intention and even complete some creative decisions in advance, eventually evolving into a "digital clone" with memory and evolution capabilities.
Anijam's goal is not just to be a tool but to become a creative platform like Adobe. Through technology, it reshapes the creative process, improves efficiency, lowers the threshold, and supports a larger - scale content production and creator ecosystem, allowing more people to participate in creation and gain value from it.
- AI will promote the equalization of creation, but attention will still be concentrated on the top
As the Agent's ability is further improved, video creation will gradually shift from a process of "continuous human feedback" to a more automated production model. In the future, there may even be a situation of "Agent providing feedback on behalf of humans". Users only need to put forward requirements, and the rest will be completed by the system for iteration and optimization.
AI is lowering the threshold of creation, allowing more people to participate in content production. Similar to the emergence of short - video platforms, which provided an expression space for creators who could not enter the movie theaters before, this is a promotion of equalization in creation.
However, at the same time, the distribution of attention will not be completely even. Movie theaters and short - video platforms are not in a substitution relationship but are new channels. Top - tier content will still attract traffic and commercial value. Therefore, what AI brings is not "equalization" but the strengthening of the top - tier effect while expanding the supply.
- The business model will shift from "selling computing power" to "paying by results"
Currently, the cost of video generation is still very high, but with the increase in demand and technological progress in the future, the price will gradually decline.
On the one hand, large - scale demand will continuously reduce the cost; on the other hand, the optimization and acceleration at the model level will significantly reduce the consumption of computing power. For example, through architecture optimization and hardware cooperation, the generation process that originally required a large number of tokens can be greatly compressed. This means that the computing resources required for the same video can be reduced by an order of magnitude, driving the overall price down rapidly and further promoting the popularization of AI video applications.
In terms of business model, the current billing method centered on computing power is essentially a phased form. As technology matures and costs decrease, it is more likely to shift to "paying by results" in the future - that is, users only pay for the final output when they are satisfied with the AI - generated video and are willing to download it, rather than paying for the token consumption during the generation process.
The establishment of this model depends on the product achieving a balance between quality, cost, and speed. Once the generated results are stable and controllable enough, the user's payment logic will also change accordingly.
- Starting from animation creators, target a wider range of content - creating groups
Anijam chooses to enter the market from the animation (animation) category. The reason is that this group is already accustomed to the digital creative process and has a higher acceptance of AI tools. Compared with real - shot video creators, the migration cost is lower. At the same time, these users often have a strong creative willingness and are the first group to be easily activated.
Among a wider range of users, the team has observed that there are a large number of "light - creative users" in the overseas market, including part - time creators and content enthusiasts. They have a strong willingness to produce content and have achieved initial commercialization on platforms such as YouTube and Instagram.
Fang Chen believes that in addition to B - end professional content companies, C - end creators also have the ability to pay and growth potential.
Moreover, there is an obvious difference in technical difficulty between animation and live - action videos. Although the generation time of the two is similar, live - action videos have higher requirements for quality. For example, they need to cross the "uncanny valley effect" and be closer to reality in terms of details, connections, and transitions, which significantly increases the overall implementation difficulty. In contrast, animation is easier to reach a usable level in terms of style tolerance and expression.
In the long run, live - action videos are still a larger market direction. As technology matures and industry solutions are gradually improved, the demand for live - action videos will surely be released, and we will also enter this field in the future.