Amazon launches the new-generation basic model Nova, emphasizing cost performance, and previewing image and video models.
Author | Wan Chen Editor | Zheng Xuan
Amazon's new model has arrived.
At the re:invent conference on Tuesday local time, during the morning Keynote session, Andy Jassy, the former AWS CEO and current CEO of Amazon, made a limited-time comeback. In about a 10-minute speech, Jassy introduced Amazon's application progress in the field of generative AI and released Amazon's new generation basic model - Amazon Nova.
In April last year, Amazon launched the first-generation large model Titan, which only had a single language modality. If Titan was just an experiment, then today's Amazon Nova series models are Amazon's real strength and significant move. Whether it's text generation from text, text generation from image, or image generation from video... For Amazon, this choice does not exist. Because the Nova series focuses on Any to Any, with arbitrary modality input and arbitrary modality output. And in the Benchmark evaluation, it is also a SOTA large model, which can almost defeat all basic models of the same magnitude and market positioning.
You might ask, just after adding a $4 billion investment in Anthropic and its Claude, it released its own self-developed blockbuster Nova. What is Amazon thinking? Especially how does it view its relationship with its model ecosystem partners?
Andy Jassy, the former AWS CEO and current CEO of Amazon, releases the Nova series basic model. | Image source: Amazon Web Services
Andy Jassy answered this question himself. He said that in the AI applications built within Amazon, the diversity of models used is astonishing. Developers also hope to have lower latency, lower costs, the ability to fine-tune, better coordination of different knowledge bases and fixed data, and also want to achieve many automated coordination operations (that is, the so-called intelligent behavior), or want to obtain better image and video effects, etc. In order to meet the diverse needs of developers, Amazon Web Services' model strategy is to give developers as many rights to make their own choices as possible.
"We have always learned the same lesson - there will never be a tool that can dominate a certain field. Just like in the database field, in the past 10 years, people will use various relational databases or non-relational databases. The same is true in the analysis field. Once people thought that TensorFlow would become the only AI framework, but we have always emphasized that many different frameworks will appear, and finally PyTorch has become the most popular one. The same situation is presented in the model aspect."
Allowing developers to experiment and combine the use of models as they expect is Amazon's answer in the era of large models.
01
Amazon Nova:
Lower cost, stronger capabilities
At the conference, Andy Jassy announced six large models of the Nova series, including four basic models for generating text, and two visual content generation models for generating images and videos.
First is the Micro model with the lightest volume, which belongs to the "text-only model" and only supports inputting text and outputting text. This is also the model in the Nova series with the fastest response speed and the highest cost performance. Jassy said that Amazon's internal developers like to use it in many simple tasks.
Jassy said that in 11 Benchmark tests, the performance of Nova Mirco is comparable to or better than Meta LLaMa 3.1 8B, and in 12 Benchmark tests, it performs better than Google Gemini 1.5 Flash-8B. The response speed of this model reaches 210 Tokens per second, which is very suitable for applications that require a fast response.
The next three are multimodal models that support multimodal input and output text content.
Among them, the Lite model is also a low-cost multimodal model, which can quickly process image, video and text input and output text content.
Jassy said that in 19 Benchmark tests, Nova Lite performed better than or equal to OpenAI's GPT-4o Mini in 17 items; in 21 benchmarks, it performed better than or equal to Google's Gemini 1.5 Flash-8B in 17 items; in 12 benchmarks, it performed better than or equal to Anthropic's Claude Haiku 3.5 in 10 items. This model also has a good performance in video, chart and document understanding tasks.
The Pro model is a high-performance multimodal model that can provide the best combination of accuracy, speed and cost for a variety of tasks.
In 20 Benchmark tests, Nova Pro performed better than or equal to OpenAI's GPT-4o in 17 items; in 21 Benchmark tests, it performed better than or equal to Google's Gemini 1.5 Pro in 16 items.
Finally, the strongest is Nova Premier, which can be used for complex reasoning tasks and can also be the best "teacher" for custom model distillation.
Jassy did not give the跑分 comparison of Premier, but it is not difficult to infer from the introduction: this model is targeted at the Orion series model released by OpenAI in September.
According to Jassy, Amazon Nova Micro, Lite and Pro are currently fully available on the market, while Amazon Nova Premier is planned to be launched in the first quarter of 2025.
In addition to performance, Jassy said that these models have other highlights. First, they are very cost-effective, which is about 75% cheaper than other excellent model products in Amazon Bedrock. In addition, they are very fast and have excellent performance in terms of latency, and are among the faster models that can be seen.
The models that have been launched are not only integrated into Amazon Bedrock, but also deeply integrated with all the functions in Amazon Bedrock. This means that developers can fine-tune the models, or use Bedrock's knowledge base, RAG, etc. to enhance the models, or use Bedrock's distillation function to "transfer" the intelligence of the large model to a smaller model, thereby improving efficiency and reducing latency.
In addition to the four text-generation models, Jassy also previewed two new models for generating visual content.
First is Amazon Nova Canvas, which is the most advanced image generation model that can generate professional-level images based on text or image prompts. It also provides some convenient features, such as using text input to edit images, and control options for adjusting the color scheme and layout. The model also has built-in functions to support the safe and responsible use of AI, including watermarking functions (to trace the source of the image) and content review functions (to limit the generation of potentially harmful content), etc.
In a third-party human comparative evaluation, the performance of Amazon Nova Canvas is better than OpenAI DALL-E 3 and Stable Diffusion. Here are a series of images generated by Amazon Nova Canvas:
Generated by Amazon Nova Canvas
Generated by Amazon Nova Canvas
Then there is Amazon Nova Reel, which is the most advanced video generation model that can easily create high-quality videos through text and image, and is very suitable for advertising, marketing or training content creation. Users can control the visual style and rhythm through natural language prompts, including camera movement, rotation and zoom. In a third-party human comparative evaluation, the video quality and consistency generated by Amazon Nova Reel is better than Runway's Gen-3 Alpha.
Video generated by Amazon Nova Reel | Video source: Amazon Web Services
Similar to Canvas, Nova Reel also has built-in safety and responsibility AI functions, including watermarking and content review. Currently, it supports generating 6-second videos, and in the next few months, it will be extended to generate videos up to 2 minutes in length.
Jassy also shared the next plans for Nova. First, the second-generation version of the above models will be developed next year. In addition, a voice-to-voice model will be launched in the first quarter, and an any-to-any model will be launched in the middle of next year. That is, a multimodal input to multimodal output model, which means that users can input various forms of content such as text, voice, image or video, and correspondingly output text, voice, image or video.
From Titan to Nova, Amazon Web Services, which has continuously launched two large models, inevitably makes some people worry that Amazon Web Services, which cooperates with many large model developers, is changing its model strategy.
Jassy obviously realized this. At the conference, he answered his own questions and explained the position of Amazon Web Services:
"Perhaps everyone will ask, how to view the model strategy of Amazon Web Services? After all, we have in-depth cooperative relationships with many model providers, and at the same time, we have also developed some models. What I want to say is that everyone can view it this way: Our goal has always been to provide choices for everyone, aiming to present the widest and highest-quality functions, which necessarily means there will be diverse choices."
Matt Garman, the CEO of Amazon Web Services, introduced that on Amazon Bedrock, developers can choose the models of Amazon or any ecological partner according to their own needs. | Image source: Amazon Web Services
02 What does the world's largest e-commerce platform do with generative AI?
In addition to releasing the new large model, at the conference, Andy Jassy also detailed the AI application cases within Amazon.
As the world's largest e-commerce platform and the "first customer" of Amazon Web Services, Amazon has tried to introduce AI to improve efficiency for many businesses in the past year to solve the problems faced by users. The typical scenarios are as follows:
Obtain better and personalized recommendations in the retail business;
Plan the best path for the pickers in the fulfillment center to deliver the goods to customers more quickly;
Apply it to our Prime Air drones, expecting to deliver goods to you within less than an hour in the next few years;
The Just Walk Out technology of Amazon Go stores and provide technical support for Alexa;
Provide more than 25 Amazon Web Services AI services to facilitate developers to build AI applications.
From the AI use cases observed by Amazon, Andy believes that the AI applications that solve problems ("practical AI") have two practical values: reducing costs and increasing efficiency, or bringing new experiences.
"Globally, those companies that are the most successful in applying AI are mainly reflected in cost avoidance and productivity improvement, and many companies have made progress in both aspects. At the same time, you are also beginning to see some completely re-conceived and reshaped new customer experiences."
In these two types of AI applications, Andy gave typical use cases within Amazon:
AI for cost reduction and efficiency increase
1) Intelligent customer service
Take customer service as an example. Amazon's retail business has hundreds of millions of customers. In the past, when they needed to contact customer service, they could contact the chatbot. In the past, this chatbot used the machine learning technology of the static decision tree, and customers had to input a lot of text to get the answer.
But after the generative AI reconstructed this system, now customers have a chatbot that understands them.
For example, if you ordered a product a few days ago and entered the new chatbot interface, it knows who you are, what you ordered a few days ago, and where you live. And it can predict through the model that if you contact customer service after a few days, it is likely to consult about the return-related issues. When you start to explain the situation to it, it can quickly tell you the location of the nearest Whole Foods or other physical store where you can return the goods. And this model is very intelligent. When it detects that the user is frustrated with the response it gives, it can also determine that the user may need to contact a human customer service to solve the problem.
Before the redesign, the customer satisfaction of this chatbot was already quite high, but since the addition of the "intelligent brain" of generative AI, the customer satisfaction has increased by 500 basis points.
2) Seller work order filling
Amazon has approximately 2 million sellers in global retail stores, and more than 60% of the goods sold are provided by these sellers. But in the past, when they listed products on the website, they needed to fill out a very long form with many fields, so that end customers could browse and understand the product information of the sellers more conveniently. This is indeed a heavy task for sellers.
Now, Amazon has used generative AI to create a new tool. Sellers only need to input a few words, or take a photo, or provide a URL, and this tool can help fill in a lot of product attribute information. This is much easier for sellers. Currently, more than 500,000 sellers are using this generative AI tool.
3) Inventory management
Inventory management in Amazon's retail business is also a big scenario. There are more than 1,000 different buildings or nodes to optimize the allocation of the right products to the fulfillment center or building closest to the end customer, so as to save transportation time and deliver the goods to you more quickly and at a lower cost. But this also means that it is necessary to know the inventory situation of a certain fulfillment center, such as how much the inventory level of each product is, which products are being ordered, how fast the ordering is, whether this fulfillment center has more storage capacity, and whether the inventory needs to be transferred to other fulfillment centers to balance the entire storage network and other issues.
To this end, Amazon uses the Transformer model to solve these problems and make predictions. Currently, a Transformer model for long-term demand prediction has increased the prediction accuracy by 10%, and the regional prediction accuracy has also increased by more than 20%. Under Amazon's retail business scale of tens of billions of dollars, a double-digit efficiency improvement means cost savings of billions of dollars.
4) Robots
In the robot scenario, Amazon's fulfillment center has deployed more than 750,000 robots. A series of AI technologies have helped the robot scenario optimize the site capacity and transmission capacity, shorten the processing time and the cost of serving customers.
Take Sparrow as an example. It is a robotic arm used for reclassification. It needs to continuously collect items from many scattered areas and gather them into containers. With the brain of generative AI, it can tell Sparrow what items are in the first box, which item it should take, and Sparrow has to identify what each item is specifically, and also know how to grasp it according to the size, material and flexibility of the material, and know where to place the item in the receiving box.
Currently, Amazon has launched about five new robot inventions in the fulfillment center in Shreveport, Louisiana, and has seen the processing time increase by 25%. In the future, the service cost is expected to be reduced by 25%.
AI for innovative customer experience
The above are all examples of Amazon's internal efforts in cost avoidance and productivity improvement. Amazon has also seen the role of generative AI in creating a new shopping experience, and Jassy also listed several typical examples.
1) Rufus shopping agent
The first application is the Rufus shopping agent.
When customers are not sure what they want and are struggling to make a choice, they may browse product categories