36Kr Exklusiv: Tsinghua-Team erschafft weltweit erstes Grundlagenmodell für Echtzeiterfassung von Physiologie und Emotionen und expandiert Hardware-Geschäft

Damit der Mensch von KI verstanden wird – die Grundlage der nächsten Generation der Mensch-Maschine-Interaktion.

Autor | Qiao Yujie

Redakteur | Yuan Silai

Hard Kr has learned that Beijing Weimian Technology Co., Ltd. (hereinafter referred to as "Weimian Technology") recently completed a financing of several million US dollars, invested by Shunwei Capital.

Weimian Technology focuses on self-developed human perception and understanding base models to create a new generation of human-machine interaction paradigms. The founding team mainly comes from Tsinghua University and has a composite background in large models, human-machine interaction, software engineering, and medicine.

Currently, there is an invisible ceiling in AI interaction: it is extremely dependent on explicit user input and lacks the perception of implicit non-verbal information. Research shows that non-verbal information accounts for up to 55% of human expression, but existing machine vision can only see actions and cannot penetrate the skin to detect emotional fluctuations or physical fatigue.

Whether it is an embodied robot in the physical world or a large language model in the digital world, due to the lack of high-precision multi-modal human state data input, they can only make mechanical passive responses, lack the closed-loop ability of active empathy, and cannot perform more generalized tasks.

In response to these pain points, Weimian Technology has self-developed the facial base model FacePhys based on the core technology of rPPG (Remote Photoplethysmography), which can output more than 120 indicators in real-time, covering multiple directions such as heart rate, heart rate variability (HRV), respiration rate, facial action units, eye movement characteristics, emotional dimensions, and voice characteristics. By binding the heart rate with acute emotions through the HRV emotional and physiological barometer, the model can recognize fake smiles and suppressed emotions, obtain objective physiological truths that cannot be faked, and provide an entry point for physiological and emotional data for large models.

Previously, noise such as changes in light and head movement has always been a key problem in the implementation of rPPG physiological perception technology. To solve this problem, Weimian Technology has built a clinical annotation dataset of tens of thousands of people, containing tens of millions of measurement sampling points, covering different skin tones and complex scenarios, and completed verification in a clinical experiment at Anzhen Hospital, integrating the ability to model complex physiological states into the base model.

Furthermore, Weimian Technology has also introduced the "state space model" into physiological signal modeling.

Founder Tang Jiankai introduced that this logic is similar to the "predicting the next token" of large language models: the large model predicts the next word, while the state space model predicts the physiological behavior state of the human body at the next moment, thus continuously tracking the dynamic changes of vital signs such as heartbeat and respiration. "In essence, it models the heartbeat as a continuous physical process rather than a splicing of discrete video frames."

This breakthrough enables the system to more accurately capture the time dynamic characteristics of cardiac pulsation and achieve non-contact diagnosis. In terms of core indicators, its heart rate detection accuracy is ≤2 BPM, reaching the medical grade standard; the end-side inference delay is ≤10ms, enabling real-time response; at the same time, the parameter scale of the end-side small model is only 0.2M, which can be directly run on ordinary mobile phones and camera devices without relying on cloud computing power.

On the basis of physiological understanding, Weimian Technology has further built a multi-modal "human understanding system."

By integrating spatial features such as movements, postures, and eye movements, and binding the heart rate with acute emotions through the HRV emotional and physiological barometer, the model can not only recognize the user's emotions but also further understand the needs and motivations behind the behavior, and even predict the user's interaction intentions and movement trajectories, achieving the ability to read people's expressions and anticipate their actions, and providing an entry point for physiological and emotional data for large models.

Based on this physiological perception base model, Weimian Technology is also promoting the integrated layout of software and hardware simultaneously.

Bildquelle: Unternehmen

In terms of software, the company outputs algorithm capabilities to robot, intelligent cockpit, and health device manufacturers through SDK/API. Currently, it has achieved large-scale implementation in three major scenarios: in the field of household robots, it has reached mass production cooperation with customers such as Haier Robotics; in the field of health care robots, it provides rapid health screening for nursing homes and communities; in the field of bionic robots, it realizes a natural interaction experience with millisecond-level low latency. In the automotive direction, the company is jointly promoting the technical verification and mass production preparation of the driver fatigue monitoring solution with a leading Tier 1 supplier.

In terms of hardware, the company has launched an embedded camera module equipped with the FacePhys model. Among them, the core product, the Findings scientific research data acquisition system, mainly provides high-precision data acquisition terminals for scientific research institutions and hospitals and has entered the stage of bulk procurement.

The following is an excerpt from the exchange between Hard Kr and Tang Jiankai (slightly edited):

Hard Kr: Are there any other companies at home and abroad using the rPPG route for physiological and emotional recognition?

Tang Jiankai: There are already some foreign companies working in related directions. For example, FaceHeart mainly focuses on cardiac health monitoring and has obtained FDA certification. Currently, it mainly serves the telemedicine scenario. However, our direction is not only heart rate monitoring but also covers more dimensions such as emotions, stress, and eye movement behavior. In terms of the ability boundary, we are extending from physiological perception to "understanding of human states."

There are also domestic teams working on rPPG, but most of the solutions are still in the mode of "recording video + cloud analysis." Usually, it is necessary to record a video for more than 30 seconds first and then upload it to the cloud for unified calculation. The entire analysis process may take dozens of seconds, making it difficult to achieve real-time response. Once the user moves, the light changes, or the posture fluctuates during the process, the overall robustness will significantly decline.

Hard Kr: Why can Weimian Technology make physiological perception based on rPPG more accurate?

Tang Jiankai: At the model level, we have made a lot of optimizations. The core idea is to use the "state space model" to predict the physiological state of the human body at the next moment. A person's physiological state does not suddenly jump from a heart rate of 60 to 100. It has continuity and periodicity. Our state space model will capture this stable change law and combine it with the periodic fluctuation characteristics in medicine to continuously predict the current state of the human body.

In addition, data quality is also crucial. Our training data does not come from "virtual labeling" by large models but from cooperation with hospitals and collection by medical-grade devices. Currently, we have established a clinical database of tens of thousands of people, so the objectivity and accuracy of the data are higher.

In terms of emotional understanding, we also have a complete set of logic. For example, psychological research has proven that high HRV often corresponds to a more positive, relaxed, or more interested state; while an increased heart rate during strenuous exercise does not necessarily mean emotional fluctuations. Therefore, we not only look at the physiological indicators themselves but also combine spatial features such as movements, postures, and eye movements to understand a person's real state.

Simply put, we are integrating the "physiological continuity in the time dimension" and the "visual perception ability in the space dimension" into a unified model, enabling AI to understand a person's physiology, emotions, and behavior simultaneously.

Hard Kr: Why did you further develop hardware modules?

Tang Jiankai: Video is different from language. It contains a huge amount of information. If all the data is uploaded to the cloud for processing, not only will the latency be high, but it will also affect the real-time interaction experience. Therefore, we prefer end-side processing, allowing perception and reasoning to occur directly on the local device, so that the response will be more timely and the interaction will be smoother.

Another important reason is privacy. The data we process is related to physiology and emotions, which are relatively sensitive information. Especially in scenarios such as medical care and health management, users would prefer to keep the data on the local device rather than upload it all to the cloud API.

Investor's view

Shunwei Capital: The real-time physiological and emotional understanding base model developed by the company is globally unique in terms of technical route and underlying architecture. This technology can be quickly implemented in diverse scenarios such as intelligent cockpits, robots, and intelligent hardware, with broad application space. Shunwei highly recognizes the team's technical and productization capabilities and is willing to deeply cooperate with Weimian Technology in full scenarios of people, vehicles, and homes, accompany it in the long term, and jointly explore the business prospects of the next generation of human-machine interaction and embodied intelligence.

Dieser Artikel wurde ursprünglich von「乔钰杰」produziert， Für Nachdruck oder Inhaltszusammenarbeit klicken Sie bitte auf Hinweise zum Nachdruck ；Bei unbefugtem Nachdruck wird strafrechtlich verfolgt.

36Kr Exklusiv | Tsinghua-Team entwickelt weltweit erstes Grundlagenmodell zur Echtzeiterfassung von Physiologie und Emotionen und baut Hardware-Geschäft aus