Unlimited free of charge. The top ten AI Labs on the global list have opened full-modal APIs. I'll test them for you first.
“Tokenmaxxing” (maximizing Token consumption) is becoming a new buzzword in the developer community. Token budget is regarded as a new indicator to measure the depth of AI usage. The CEO of a startup even posted the Anthropic bill on LinkedIn as an important proof of the company's AI adoption.
Image source: Swan AI CEO Amos Bar-Joseph
However, a key question has been overlooked: What exactly are we creating with Tokens?
The cost is indeed soaring. Goldman Sachs predicts that Token usage may increase by more than 24 times in the next few years. The vice president of applied deep learning at NVIDIA even admitted that AI costs have exceeded the team's salary.
Image source: Goldman Sachs
But what about the quality of the output? The data gives a rather pessimistic answer.
The code generation field is particularly typical. Waydev tracked more than 50 enterprises and found that the long-term retention rate of AI-generated code is only 10% - 30%. A report from GitClear is more straightforward: The rework volume of heavy AI users is 9.4 times that of non-AI users. Statistics from another platform, Jellyfish, show that although the code throughput of some teams has doubled, the Token cost has soared nearly 10 times.
The growth rate of code rework exceeds the productivity growth rate. Data source: GitClear
As Token consumption continues to increase, the input-output ratio becomes particularly important. Token anxiety is becoming a real obstacle to the implementation of AI.
In this increasingly spreading "Token anxiety", the move of Agnes AI stands out - This AI Lab, ranked ninth in the global list, announced that as of June 1st, its full-modal model APIs will be open for free indefinitely.
01. When Tokens become more and more expensive, an AI Lab zeros out the bill
This opening by Agnes AI covers its three core models: The text model Agnes-2.0-Flash, the image model Agnes-Image-2.0-Flash, and the video model Agnes-Video-V2.0.
As of June 1st, the above model APIs will be open for free to global developers indefinitely. For small and medium-sized teams, independent developers, and creators, the cost threshold for model calls has completely "disappeared", and the room for trial and error has been greatly expanded.
In the view of Agnes AI, high-quality AI should not be exclusive to high-budget companies. Agnes AI hopes to allow limited budgets to be more used for product innovation and feature iteration by freely opening the full-modal model APIs for text, images, and videos.
02. Opening text, images, and videos together, let's test the strength firsthand
After the cost is zeroed out, what developers are more concerned about is: Can the models really deliver? We conducted actual tests on the text, image, and video models respectively.
- Text model: 1M ultra-long context, quickly build productivity scenarios
Agnes-2.0-Flash supports a 1M context window and tool calls, covering scenarios such as code development, enterprise knowledge base, intelligent customer service, document processing, and Agent workflows. To see how these capabilities perform in real scenarios, we conducted tests in areas such as code generation, web page building, and front-end design.
First, let's look at the programming ability. We first asked Agnes-2.0-Flash to generate a plane shooting game web page.
The model not only generated a complete gameplay framework (fighter planes, small monsters, boss battles, scoring, and health points) but also actively added combo prompts, particle explosions, a dynamic starry sky background, and sound effects. The completion degree far exceeds that of a general demo and is close to a playable finished product.
In the second test, we changed the direction. With just one prompt, Agnes-2.0-Flash completed the construction of a SBTI personality test website.
The website includes a complete test process, result calculation logic, and a personality type display page. Users can directly view the results after completing the questions. From the final effect, the entire experience is very close to that of common online personality test products.
From the above two cases, whether it is a web game or a test website, Agnes-2.0-Flash can complete them relatively quickly.
Next, we further increased the difficulty and focused the test on front-end design and product interface generation capabilities.
The third test was a photography works display website. The first impression of the generated result is good aesthetics. The overall layout and visual style are close to the completion degree of many independent photographers' portfolio websites.
If the above tests the aesthetic and page layout abilities, then a social product tests more complex interaction abilities. Therefore, we further asked Agnes-2.0-Flash to generate a social sharing website similar to X.
The final generated page includes multiple core modules such as an information stream, a search box, a follow button, a side navigation bar, and recommended content. Buttons such as follow and like can trigger corresponding animation feedback. From the visual presentation and interaction experience, the overall design is very similar to the product form of X.
Overall, if only used as a product demo or front-end prototype display, the capabilities of Agnes-2.0-Flash can already meet the early verification and demonstration needs of many projects.
- Image model: Focus on editability, covering portrait retouching, e-commerce images, and infographics
Agnes-Image-2.0-Flash supports capabilities such as image-to-image modification, multi-image fusion, background replacement, local editing, text modification, and style conversion, which are suitable for high-frequency scenarios such as e-commerce main images, advertising design, product posters, and social media content production.
We first tried a group of tasks for reshaping a person's image, focusing on testing the model's editing ability in portrait close-up scenarios. The model needed to retain the original identity features of the person while reshaping them into the style of a K-pop idol on a performance stage.
From the results, while significantly modifying the person's appearance, the facial consistency remains stable. At the same time, the skin texture, light and shadow levels, and lens texture have been further enhanced, making the person look closer to the presentation effect of professional photography and commercial visual works.
In the second group of tests, we focused on the e-commerce design scenario. We directly uploaded a real photo of hair essential oil and asked the model to generate a complete e-commerce poster.
The final generated result not only retains the product main body and brand logo but also automatically supplements product selling point copywriting, visual decorative elements, and a layout design in line with the e-commerce style. From the finished product, the product main body is prominent, the light and shadow and texture are enhanced, and visual elements in line with the product's tone are added to the background. For e-commerce operations, brand marketing, and content teams, such capabilities can reduce the workload in shooting, photo retouching, and typesetting.
Infographics are a more complex test for the image model. Therefore, we designed two groups of infographic tasks in different directions.
The first group of tests is more inclined to flowcharts and knowledge popularization scenarios.
The generated result not only completed the construction of a complete process structure but also automatically added a large number of icons, illustration elements, and visual guiding symbols. Different steps are connected through arrows, color blocks, and hierarchical relationships. Even with a large amount of text information, the overall visual recognition remains good.
The second group of tests further increased the difficulty. We asked the model to generate a set of architectural concept design infographics based on the characteristics of marine life, which should not only show the source of inspiration but also present the design derivation process and the final architectural plan.
The model completed a complete expression from biological form analysis, design language extraction to the implementation of the architectural concept, including multiple sections such as reference materials, structural disassembly, color analysis, space deduction, and the final renderings.
The two groups of tests show an obvious feature: When the information density continues to increase, Agnes-Image-2.0-Flash can not only generate the corresponding content but also actively organize the page structure, allowing the picture to undertake both "display" and "explanation" functions.
For users who need to create popular science content, business reports, design plans, and long social media images, such capabilities will be more practical.
- Video model: Supports simultaneous audio and video output, with a cinematic feel and good character acting
Agnes-Video-V2.0 supports capabilities such as simultaneous audio and video generation, video generation from the first frame, video generation from the first and last frames, and multi-frame generation. The output resolution can be selected as 720P or 1080P, which can be used for short video production, advertising material production, storyboard creation, and automated video workflows.
First, we tested the simultaneous audio and video generation ability of Agnes-Video-V2.0.
The first case is a drum performance video.
In the video, a boy sits in front of the drum set and completes the performance. At the beginning of the video, he first steps on the bass drum pedal with his foot and then starts playing with the drumsticks. Throughout the process, the timing of the drum beats is synchronized with the boy's actions. For music performance content, such details are often more difficult to handle than the simple picture quality.
The second case further increased the complexity. In the band video, there are three people: the lead singer, the