A Technical Perspective on SenseTime's 2024 Annual Report: Remarkable Improvement in Training and Inference Efficiency, and a Closed-loop of Commercial Value Achieved in the "Large-scale Infrastructure - Large Model - Applications" Ecosystem
On March 26, SenseTime released its full - year performance report for 2024. In 2024, SenseTime Group's total revenue increased by 10.8% year - on - year, reaching 3.77 billion yuan. Among them, the revenue from generative AI exceeded 2.4 billion yuan, a year - on - year increase of 103.1%. This is the second consecutive year that generative AI has maintained triple - digit growth, and it has become the group's largest business.
Since the launch of large models more than two years ago, from blindly pursuing the Scaling Law of large models in the early stage to the successive doubts about computing power efficiency, training - to - inference conversion, and application implementation, large models have fully entered a new stage of development.
This is a sign of the gradual maturity of the industry. However, beyond simply stacking computing power, it also places higher requirements on engineering technology and scenario collaboration capabilities.
Among many traditional large - model manufacturers, as a first - generation "AI - native" enterprise, SenseTime began to invest in AI infrastructure several years ago. Subsequently, its strategy was upgraded to a trinity core strategy of "large - scale computing facilities - large models - applications", and this strategy is showing its forward - looking nature in the AI industry.
The "large - scale computing facilities" refer to SenseTime's AI infrastructure, which provides powerful computing power support for large models. Large models drive technological innovation, and the application end promotes the commercialization of AI. In the past three years, SenseTime has formed a virtuous ecological closed - loop of synergy among "large - scale computing facilities - large models - applications", empowering various industries.
What we can see is that after experiencing multiple ups and downs in the technology cycle, SenseTime's technological accumulation is about to explode.
01 Having and Understanding Computing Power
In May 2024, the list of the authoritative domestic large - model evaluation institution SuperCLUE was released. SenseTime's "Rixin 5.0" (SenseChat V5) ranked first in the Chinese benchmark evaluation, setting a new domestic best score of 80.03 points. Moreover, it surpassed GPT - 4 Turbo in the comprehensive Chinese score. This is the first time that a domestic large model has surpassed GPT - 4 Turbo and topped the SuperCLUE Chinese benchmark test.
The reason for achieving such results is closely related to SenseTime's early layout in AI infrastructure.
As is well - known, since 2024, the construction of national intelligent computing centers has been advancing at a high speed. From training to inference, the resource attribute of computing power is becoming more and more obvious. Moreover, the market still faces the problems of scattered computing power resources, non - unified standards, and low efficiency in use.
SenseTime targeted this pain point by laying out computing power operations, connecting cards with different standards, adapting to different needs, and meeting the requirements of different types of customers.
Xu Li, the chairman and CEO of SenseTime, believes that some technology giants focus on their own ecosystems, including self - developed chips and cloud platforms. However, to gain an early advantage in the current AI field, one should use whatever resources are faster and better, without being limited to the products and platforms of a single company. "The basic services provided by SenseTime are more in line with the current development situation of AI."
In the past three years, SenseTime has continuously invested in the construction of AIDC infrastructure. It is reported that SenseTime's self - owned Shanghai Lingang AIDC, the first 5A - level intelligent computing center in China, has increased the computing power scale to 23,000 PetaFlops through its operation model.
Through joint optimization with large - model iteration, the goal of SenseTime's large - scale computing facilities is to become the "AI infrastructure that best understands large models". It not only serves the training and inference of the Rixin large model but also serves mature industries such as the Internet, finance, and energy, as well as customers in high - potential industries such as embodied intelligence, AIGC, and AI4S (AI For Science).
Xu Li once said that SenseTime is the computing power service provider that best understands models and the model service provider that best understands computing power.
02 The Trinity Strategy
Computing power is just one part of the ecosystem. For the large - model industry to operate efficiently, it also requires the coordinated efforts of the upstream and downstream.
Xu Li said, "Today, whether it is training models or using models for external services, the business models of artificial intelligence essentially consume resources and pay for resource costs. All business models are ultimately equated with the consumption of computing resources. Through the 'trinity' strategy, resources can be integrated and used in the most effective way."
The "trinity" strategy was established by SenseTime in October 2024. It is a strategy that uses large - scale computing facilities as the foundation of AI infrastructure to achieve the trinity and joint optimization of "large - scale computing facilities - large models - applications".
The two - way optimization of computing power and models has further improved the training and inference efficiency of large models. In terms of improving training efficiency, SenseTime has significantly improved training efficiency by adopting an automated multi - dimensional parallel strategy and achieved FP8 mixed - precision training. Especially for excellent third - party open - source models like DeepSeek, the large - scale computing facilities have achieved higher training efficiency than the official reports, setting an industry benchmark.
In terms of improving inference efficiency, SenseTime's inference system conducts low - bit quantization inference and supports both the open - source vLLM and the self - developed lightLLM inference engines. Taking DeepSeek R1 as an example, SenseTime's inference throughput performance is more than 15% higher than that of leading industry manufacturers. Through technologies such as model distillation, key - value caching, PD (Prefill - decode) separation, and multi - modal information compression, SenseTime has reduced the inference cost by an order of magnitude while basically maintaining the model performance.
Therefore, even for the same model, the training and inference efficiency will be completely different on different computing power bases.
With the steady progress of the evolution towards generative AI, the "trinity" strategy has further integrated SenseTime's resource advantages, enabling it to stand out in the highly competitive large - model era.
SenseTime's Rixin large model focuses on polishing applications and products in two directions: productivity tools and interaction tools. Productivity tools directly improve production efficiency in scenarios such as corporate office work, finance, and government affairs. The willingness of customers to pay, represented by the order amount, has increased by six times compared with 2023. Interaction tools empower business partners through the 2B2C model, enhancing the user experience and meeting the needs of multiple scenarios such as intelligent companionship, intelligent hardware interaction, and intelligent marketing. The average monthly user usage has increased by eight times compared with 2023.
On the basis of maintaining a leading market application share and customer service stickiness, the Rixin large model has always maintained a leading position in model technology. In April 2023, SenseTime first launched the SenseNova large - model system. By July 2024, it had completed five major version iterations. Among them, the Rixin 5.5 version released in July 2024 has significantly improved its multi - modal capabilities. It is the first domestic multi - modal real - time interactive large model benchmarked against GPT - 4o, realizing the native integration of voice, video, and language models. It is worth looking forward to that SenseTime's Rixin 6.0 will be released on April 10, 2025, and its performance is expected to be benchmarked against Gemini 2.0 Pro.
In addition, after the infrastructure is built, SenseTime has also laid out applications early.
03 Why Native - Integrated Multi - Modal Technology
After the explosion of generative AI, multi - modal large models have long become the pursuit of people. However, many multi - modal models encountered in applications on the market cannot be regarded as "complete forms".
As Google believes, only multi - modal models developed from scratch can build advanced models that surpass their predecessors. This means that it can naturally read and output content in different modalities and has powerful multi - modal reasoning and cross - modal migration capabilities.
Technically, this is called "native - integrated multi - modal technology", which is considered the inevitable path for the future development of AI and is also the research area where SenseTime is currently focusing its investment.
Different from traditional multi - modal models, SenseTime's technology does not simply convert different - modality content into language tokens for input. Instead, it conducts full - process integration from the data layer and model architecture layer, covering the entire process from perception, thinking to output.
At the 2024 global CVPR conference, a total of 50 papers from SenseTime were selected. The research results focus on visual - language foundation models, covering cutting - edge fields such as autonomous driving and robotics.
SenseTime's diversified AI products have achieved certain application results. It is reported that currently, SenseTime's "Little Raccoon Family" has provided billions of times of intelligent assistance services to hundreds of thousands of users. And SenseTime's Jueying has also taken the lead in the industry in deploying native multi - modal large models on vehicle terminals.
In the new stage of AI, SenseTime has proactively laid out hardware infrastructure and application ends, and built a technological barrier through the coordinated optimization of the underlying and upper layers.
On the infrastructure end, SenseTime has built its own AI data center (AIDC) and large - scale R & D services, ensuring that the company stands out among traditional infrastructure manufacturers and AI - native companies. On the application end, SenseTime has a full - stack AI application system, covering a wide range of industries, and its large models focus on the development of native - integrated multi - modal technology.
It can be predicted that SenseTime's "reserves" are expected to bring huge development space for the company after the explosion of AI applications.