HomeArticle

SenseTime's SenseFoundry platform undergoes a comprehensive upgrade to build the next-generation urban intelligent foundation | Frontline Report

黄 楠2025-12-11 18:06
SenseTime Ark services have covered nearly 200 cities at home and abroad.

Author | Huang Nan

Editor | Yuan Silai

On December 9th, at the 2025 SenseTime AI Forum held in the Hong Kong Science Park, SenseTime officially announced the comprehensive upgrade path of its flagship platform, "SenseTime Ark".

Currently, visual AI has become the core driving force for the intelligent upgrade of smart cities and industries. However, the traditional algorithm production method of visual AI highly depends on professional algorithm experts and generally faces three major pain points: long R & D cycle, high cost, and high entry threshold. With the continuous emergence of long - tail scenarios, the traditional algorithm model is difficult to meet the needs of large - scale applications.

Dr. Xu Li, Chairman and CEO of SenseTime, said, "The past decade has been the decade with the fastest cognitive change in artificial intelligence. We are experiencing perhaps the biggest technological wave in history. AI is reshaping the working methods of every industry, from perception to generation, from the cloud to the edge, and now to embodied intelligence and world models."

Dr. Xu Li, Chairman and CEO of SenseTime

Piao Yuankui, Senior Director of the Smart City and Business Group at SenseTime, also pointed out that the arrival of the large - model era is accelerating the reconstruction of industry paradigms. Algorithm design is no longer only dependent on experts but is open to on - site engineers. Model application has also shifted from "customized development" to "intelligent production". The industry urgently needs a new visual AI production model to promote faster adaptation of model capabilities to business and more efficient deployment.

To this end, SenseTime launched the upgraded version of visual AI 2.0, "SenseTime Ark", and built a new - generation visual algorithm production model around two major systems: "general - specialized integration" and "intelligent training closed - loop".

At the level of "general - specialized" model orchestration, Ark achieves progressive inference of long - tail visual tasks through the multi - level collaboration of lightweight small models and general large models, which not only ensures recognition accuracy but also significantly reduces computing power consumption. In terms of the "intelligent training" system, Ark takes Agentic Training as the core and forms a full - process closed - loop around data intelligent construction, model training, evaluation, and deployment, enabling end - to - end automation of visual model production from image collection to business decision - making, allowing front - line engineers to quickly build usable visual models.

SenseTime's new - generation visual algorithm production model

Meanwhile, the Ark platform is building a visual intelligent agent platform that integrates "perception - decision - action", which connects the visual understanding of the digital world with the embodied actions of the physical world, supports the collaborative perception and decision - making of heterogeneous terminals such as drones, robotic dogs, and unmanned vehicles, and promotes the inspection and patrol scenarios towards air - ground integrated intelligence.

Based on the capabilities of multi - modal large models, SenseTime Ark SenseFoundry can efficiently schedule various types of visual models and build a closed - loop workflow that runs through scenario perception, data processing, and intelligent decision - making. Its core advantage lies in breaking through the capability boundaries of traditional visual AI. It can not only "see" but also "understand, think, and make decisions", promoting the transformation of urban governance from "passive response" to "active prediction".

This technological breakthrough has also accelerated the process of industrial intelligence. Taking the government affairs field as an example, based on AIGC technology and traditional computer vision technology, SenseTime built the "Kunming Artificial Intelligence Empowerment Center Construction and Operation Integration Project", aiming to build a comprehensive and multi - level city - level artificial intelligence service system, covering artificial intelligence infrastructure services, artificial intelligence basic common application support services, and typical intelligent application service scenarios.

In addition to the core market in the Chinese mainland, the technological capabilities and platform system of SenseTime Ark are also continuously expanding in the Hong Kong, Macao, and overseas markets.

In the Hong Kong and Macao markets, as the construction of smart cities in Hong Kong speeds up, the urban governance scenarios have put forward more systematic requirements for visual AI. Feng Yu, General Manager of SenseTime's Hong Kong and Macao business, said that Ark's new platform - based, model - based, and intelligent - agent - based system meets the needs of the Hong Kong and Macao markets from "analysis to insight" and from "insight to decision - making".

Currently, SenseTime Ark has formed large - scale applications in multiple key scenarios such as urban safety, transportation, manufacturing, drone patrol, and embodied intelligence, serving nearly 200 cities at home and abroad.

At the event site, in the keynote speech titled "From Emergence of Capabilities to Closed - Loop of Value: The Value and Innovation Path of Multi - Modal Large Models", Professor Lin Dahua, co - founder and Chief Scientist of SenseTime, pointed out that after three years of "rapid development", "we have reached a critical crossroads again". There are two important paths for the future development of the industry: First, AI needs to be truly applied in practice, driving the development of technology and applications with value; second, it is necessary to return to the laboratory to explore the original innovation of the next technological paradigm.

Professor Lin Dahua sharing the value and innovation path of multi - modal large models in the keynote speech

To this end, through underlying innovation, including the native multi - modal fusion architecture NEO, the cross - perspective prediction training paradigm, and the high - efficiency inference system SekoTalk, SenseTime can effectively improve the model's spatial cognition and real - time interaction capabilities, promote the deepening of large models from "AI for X" to "AI in X", and achieve the closed - loop integration of intelligent agents and scenarios.

As artificial intelligence enters the "large - model era", embodied intelligence and world models are becoming the key technological directions driving industrial transformation. Dr. Wang Xiaogang announced at the forum that the Daxiaobot will officially debut on December 18th, releasing a number of globally leading technologies and product portfolios, and will launch the first domestic open - source and commercially applied "Kaiwu" World Model 3.0. It will also jointly build an integrated industrial ecosystem of "model - hardware - scenario" with ecological partners to promote the progress of the embodied intelligence industry.

Round - table forum: From the "Digital World" to the "Physical World": How Embodied World Models Reshape Human - Machine Interaction

Embodied intelligence is gradually bridging the gap from the "digital world" to the "physical world". The Daxiaobot will equip robots with a smart "brain", enabling them to shift from "passive execution" to "independent exploration". Moreover, it will transform cutting - edge intelligence into reliable products and integrate them into every specific life scenario.