HomeArticle

Shenzhen Embodied Intelligence Company Secures 100 Million Yuan Financing from Inovance and China Telecom, Ranking No.1 in "Visual-Tactile" Sensor Shipments Across the Industry | Hard Krypton Exclusive First

黄 楠2026-06-04 09:30
Build world models with "visuotactile" technology.

Author | Huang Nan

Editor | Yuan Silai

Hard Kr has learned that Daimon Robotics recently completed a 100-million-yuan Series A financing round, jointly invested by Huichuan Industry Investment, an industrial fund under Huichuan Technology, and China Telecom. The funds will be used to further build a super-large-scale physical interaction information dataset, accelerate the research and development of the physical world model, and drive the data flywheel and business closed-loop in real physical scenarios.

Daimon Robotics officially started operations in 2023. Its core team has long focused on the fields of robotic dexterous manipulation and physical interaction intelligence. Professor Wang Yu, the co-founder and chief scientist, once served as the founding dean of the Robotics Research Institute at the Hong Kong University of Science and Technology. The concepts he proposed, such as "embodied skills" and "skill cloning," are important components of Daimon Robotics' core technology roadmap. Dr. Duan Jianghua, the founder and CEO, and the main technical leaders all come from the core team of the Robotics Research Institute at the Hong Kong University of Science and Technology and have 10 years of know-how in manipulation intelligence. Yuan Weihao, the chief AI scientist, was once a multimodal research expert at Alibaba's Tongyi Lab and has cutting-edge experience in migrating world models to robotic physical manipulation.

As the popularity of embodied intelligence continues to rise, the industry logic is undergoing profound changes. The development of the track evolves along a clear path: from the early competition in the walking and motion control capabilities of robots to the exploration of differentiated algorithm architecture routes and the "embodied brain." Each round of hotspots has accumulated key foundations for its breakthrough.

As humanoid robots move from stage demonstrations to real-world operations, the threshold for refined whole-machine practical operations continues to increase. Whether high-quality physical interaction data can be collected has become a key dividing line for the industry's implementation.

In the mainstream pure vision perception solutions, sensors can only capture the appearance of objects and cannot identify physical characteristics such as hardness, softness, friction coefficient, and deformation under force, making it difficult to support robots in predicting object changes. In contrast, the physical interaction data that integrates touch can completely record key parameters such as instantaneous force and material properties, precipitate physical common sense in large-scale model training, accelerate convergence, help robots establish physical causal cognition, and implement various refined operations.

Daimon starts from the collection and annotation of physical interaction data, gradually builds a complete technical link covering perception, operation, and learning, and then constructs a world model that can provide physical common sense for robots.

At the cognitive level, its model can achieve the alignment of vision and touch modalities, enabling robots to infer the physical properties of objects from images and also infer the object shape from touch sensations. In the execution stage, with the help of high-response frequency tactile feedback, it helps the device complete perception, judgment, and action correction within milliseconds of contact, forming a closed-loop control.

Perform refined operations such as stringing grapes and placing eggs with physical intuition (Source/Enterprise)

"For robots to be able to work, an understanding of the causality in the physical world and feedback based on real contact are essential," Duan Jianghua, the CEO of Daimon Robotics, told Hard Kr. "A robot that can do parkour and somersaults will have greatly reduced application value if it can't pick up a sponge with just the right amount of force to wipe an object. 'Vision is a non-contact remote signal. It can tell you where an object is, but it can't tell you why a sponge deforms when you touch it. Touch, on the other hand, is the 'feel' at the moment of contact and is the key to judging physical causality and achieving refined operations."

However, technology and models alone are not enough. How to drive the continuous iteration of the physical world model with data closed-loop and professional evaluation standards is another major challenge currently faced by the industry. Duan Jianghua pointed out to Hard Kr, "The essence of the shortage of tactile data lies in that the data representation method for vision has been relatively unified, while there is no standard for touch, and there is a lack of a large-scale, multimodal real data collection system."

To solve this problem, Daimon has built an "outward-distributed" embodied data collection network. Different from the traditional model that relies on fixed-point laboratories and remote operation for data collection, the "outward-distributed" collection network disperses the centralized laboratory and implements distributed social collection, which can effectively achieve the authenticity of scenarios, a qualitative change in collection efficiency, and a decrease in marginal costs.

In April 2026, Daimon Robotics, in collaboration with dozens of leading domestic and international institutions including Google DeepMind, released the world's largest full-modal physical world dataset with touch, Daimon-Infinity, which contains contact information such as texture, hardness, and mechanics. It also open-sourced 10,000 hours of data for free use by the industry. Based on the dataset, a systematic evaluation standard was established, and in June, a full-modal Benchmark system for physical interaction capabilities, RobOmni, supporting both "real data training + simulator training" modes was launched.

Human infants learn about the world and develop their intelligence by touching. For robots that are about to enter households from factories, this lesson cannot be skipped either. After solving the problems of "seeing clearly" and "walking steadily," "touching accurately" is becoming the last and most crucial "kilometer" for embodied intelligence to enter the physical world. Daimon Robotics is trying to define its own standards in this technological process of "feel."

Human infants learn about the world and develop their intelligence by touching. For robots that are about to enter households from factories, this lesson cannot be skipped either. After solving the problems of "seeing clearly" and "walking steadily," "touching accurately" is becoming the last and most crucial "kilometer" for embodied intelligence to enter the physical world. Daimon revealed to Hard Kr that the shipment volume of its visual-tactile sensors currently ranks first in the world. It is trying to define its own standards in this technological process of "feel."

The following is an excerpt from an interview between Hard Kr and Duan Jianghua, the CEO of Daimon Robotics (slightly edited):

Hard Kr: From perception to execution, embodied intelligence needs to cross the gap from "understanding" to "working." How does Daimon's physical world model handle the fusion of visual and tactile modalities and low-level control? What tasks that robots couldn't do before can this architecture help them complete when facing complex operation tasks?

Duan Jianghua: Our model infers physical causality. In terms of the model structure, we split physical contact into two layers: the cognitive layer and the execution layer.

What the cognitive layer does is to enable two-way mapping of vision and touch in the same semantic space. This is similar to human synesthesia. When you see a strawberry, you know it will have a granular texture without squeezing it. When you insert a key into the lock to open the door, your hand blocks your view at the moment the key enters the lock. Without seeing the contact state between the key and the keyhole, humans rely on intuition and feel to complete the operation - whether it's inserted, whether it's stuck, and whether to turn it. We hope robots can also do this.

Daimon Robotics uses a gripper to pick up an egg (Source/Enterprise)

There are two mechanisms running simultaneously in the execution layer. One is a high-frequency tactile servo at the hundred-hertz level, similar to a spinal reflex. Without going through upper-level reasoning, as soon as an object starts to show a slipping tendency, a compensatory action is sent out, even before the visual frame has switched. This is like when you're washing dishes and a plate covered with dish soap starts to slip a little. You don't need to look at it with your eyes; your fingers will instinctively tighten to hold the plate.

The other is physical world reasoning. The model continuously predicts the operation states in the next few steps and gives correction strategies in advance before a mistake actually occurs. This is like when you're pouring water from a kettle into a cup with one hand. As the water flows out, the center of gravity of the kettle bottom is constantly changing. Your brain will continuously predict the weight distribution of the kettle in the next second based on the water flow rate and adjust the tilt angle of your wrist smoothly in advance to ensure a steady flow of water.

These two mechanisms correspond to millisecond-level reactions and multi-step forward-looking respectively. Although they have different time scales, they work together in the same task. This is the most important structural difference compared to pure vision operation models.

Hard Kr: Daimon has recently released a dataset and a Benchmark for robotic physical interaction capabilities. What is the connection between these and the physical world model you're working on?

Duan Jianghua: The dataset is the fuel, the physical world model is the engine, and the Benchmark is the tachometer.

Traditional datasets, whether visual or simulated, record "pixel changes" or "trajectories." But to enable robots to understand the physical world, these are far from enough. For example, is an object soft or hard? Is its surface smooth or rough? How much normal pressure, tangential force, and slipping tendency are there when grasping? These all belong to physical property information. The Daimon-Infinity dataset collects more than a dozen modalities, including pressure, deformation, texture, stiffness, and slipping tendency.

The biggest difficulty is not to collect a single modality alone, but to strictly align these more than a dozen tactile modalities with visual images and action instructions in the millisecond-level spatio-temporal dimension.

Daimon Robotics achieves the task of threading grapes autonomously (Source/Enterprise)

For example, when a robot's finger touches an object, the tactile sensor should record the pressure distribution and texture information at the contact point. At the same time, the camera should record the picture at that moment, and the control system should record the joint angle and torque at that moment. These three must be synchronized to the millisecond level in time; otherwise, the model will have difficulty learning the correct causal logic.

With the data and the model in place, the next question arises - how to judge whether the model has really learned physical causality? This is the significance of Daimon's launch of RobOmni.

Existing benchmark evaluations in the field of embodiment often focus on the visual perception modality, emphasizing the robot's generalization grasping and long-sequence planning tasks. The evaluation standards for the tactile perception modality and contact refined operations are not yet perfect.

The industry still lacks a standardized evaluation benchmark for tactile perception and dexterous manipulation. There is no unified standard between different models and data, making it difficult to quantify tactile capabilities and systematically verify the generalization ability of models.

We've noticed that some teams focusing on simulation and Sim2Real fields have recently started to introduce visual-tactile fusion evaluations. This shows that the entire industry frontier is reaching a consensus - pure vision is not enough for robots to truly understand and interact with the world, and touch is unavoidable. RobOmni fills this gap and provides a standardized, comparable, reproducible, and scalable verification entry for physical interaction capabilities.

Without a ruler, we can't measure progress. Without standards, the industry can't form a joint force. So we need to make a ruler first and then measure the world.

Comments from investors:

A relevant person in charge of Huichuan Industry Investment said that for embodied intelligence to achieve a generational leap in real-scenario operations, tactile perception to complement physical causal logic is a necessary path. Daimon Robotics is one of the few companies in the industry that starts from physical causal logic, drives with massive visual-tactile data, and promotes the implementation of the physical world model in refined operation scenarios. Huichuan Technology has long been deeply involved in the fields of industrial automation and intelligent robots and is well aware of the strategic value of multimodal perception in refined operation scenarios. In the future, based on Huichuan's scenario and industry knowledge, we look forward to jointly building a tactile neural network in the era of embodied intelligence with Daimon.

A relevant person in charge of China Telecom Investment Company said that for embodied intelligence to achieve large-scale commercial implementation, it not only depends on the continuous iterative upgrade of cloud-based large model computing power but also highly relies on high-precision physical perception capabilities and a multimodal data system as support. Daimon Robotics has deep accumulation in the visual-tactile perception track and has built a solid core technology barrier. As a backbone force in the construction of a digital China, China Telecom is fully implementing the "Cloud Transformation, Digital Transformation, and Intelligence Benefits" strategy. In the future, we look forward to deeply collaborating with Daimon Robotics to jointly create implementable and replicable industry solutions for embodied intelligence, build a new digital infrastructure to empower the development of new productive forces, and help accelerate the high-quality development of the embodied industry to achieve ecological win-win results.