36Kr Exclusive | Tsinghua Team Develops the World's First Foundation Model for Real-Time Physiological and Emotional Understanding, Further Expands into Hardware
Author | Qiao Yujie
Editor | Yuan Silai
Hard Krypton has learned that Beijing Weimian Technology Co., Ltd. (hereinafter referred to as "Weimian Technology") recently completed a financing of millions of US dollars, invested by Shunwei Capital.
Weimian Technology focuses on self - developed human perception and understanding base models to create a new generation of human - machine interaction paradigms. The founding team mainly comes from Tsinghua University, with a composite background in large models, human - machine interaction, software engineering, and medicine.
Currently, there is an invisible ceiling in AI interaction: it extremely depends on explicit user input and lacks the perception of implicit non - verbal information. Research shows that non - verbal information accounts for up to 55% of human expressions, but the existing machine vision can only see actions and cannot penetrate the skin to detect emotional fluctuations or physical fatigue.
Whether it is an embodied robot in the physical world or a large language model in the digital world, due to the lack of high - precision multi - modal human state data input, they can only make mechanical passive responses, lack the closed - loop ability of active empathy, and cannot perform more generalized tasks.
In response to these pain points, Weimian Technology has self - developed the facial base model FacePhys based on the rPPG (Remote Photoplethysmography) core technology. It can output more than 120 indicators in real - time, covering multiple directions such as heart rate, heart rate variability (HRV), respiratory rate, facial action units, eye movement features, emotional dimensions, and voice features. By binding the heart rate with acute emotions through the HRV emotional and physiological barometer, the model can identify fake smiles and suppressed emotions, obtain objective physiological truths that cannot be faked, and provide an entry for physiological and emotional data for large models.
Previously, noise such as changes in light and head movement has always been a key problem in the implementation of rPPG physiological perception technology. To solve this problem, Weimian Technology has built a clinical annotation dataset of tens of thousands of people, containing tens of millions of measurement sampling points, covering different skin colors and complex scenarios. It has been verified in a clinical experiment at Anzhen Hospital, and the ability to model complex physiological states has been integrated into the base model.
Furthermore, Weimian Technology has also introduced the "state - space model" into physiological signal modeling.
Founder Tang Jiankai introduced that this logic is similar to the "predicting the next token" of large language models: the large model predicts the next word, while the state - space model predicts the physiological behavior state of the human body at the next moment, so as to continuously track the dynamic changes of vital signs such as heartbeat and breathing. "In essence, it models the heartbeat as a continuous physical process rather than a splicing of discrete video frames."
This breakthrough enables the system to more accurately capture the time - dynamic characteristics of heartbeats and achieve non - contact diagnosis. In terms of core indicators, its heart rate detection accuracy is ≤2 BPM, reaching the medical - grade standard; the end - side inference delay is ≤10ms, enabling real - time response; at the same time, the parameter scale of the end - side small model is only 0.2M, which can run directly on ordinary mobile phones and camera devices without relying on cloud computing power.
On the basis of physiological understanding, Weimian Technology has further built a multi - modal "human understanding system".
By integrating spatial features such as actions, postures, and eye movements, and binding the heart rate with acute emotions through the HRV emotional and physiological barometer, the model can not only identify the user's emotions but also further understand the needs and motivations behind the behavior, and even predict the user's interaction intention and action trajectory, achieving the ability to read people's expressions and anticipate their actions, and providing an entry for physiological and emotional data for large models.
Based on this physiological perception base model, Weimian Technology is also promoting the integration of software and hardware in parallel.
Image source: the company
At the software level, the company outputs algorithm capabilities to robot, intelligent cockpit, and health device manufacturers through SDK/API. It has achieved large - scale implementation in three major scenarios: in the field of household robots, it has reached mass - production cooperation with customers such as Haier Robotics; in the field of health - care robots, it provides rapid health screening for nursing homes and communities; in the field of bionic robots, it realizes a natural interaction experience with millisecond - level low latency. In the automotive direction, the company is jointly promoting the technical verification and mass - production preparation of the driver fatigue monitoring solution with a leading Tier 1 supplier.
At the hardware level, the company has launched an embedded camera module equipped with the FacePhys model. Among them, the core product, the Findings scientific research data collection system, mainly provides high - precision data collection terminals for scientific research institutions and hospitals and has entered the stage of bulk procurement.
The following is an excerpt from the exchange between Hard Krypton and Tang Jiankai (slightly edited):
Hard Krypton: Are there any other companies at home and abroad using the rPPG route for physiological and emotional recognition?
Tang Jiankai: There are already some foreign companies working in related directions. For example, FaceHeart mainly focuses on cardiac health monitoring and has obtained FDA certification. Currently, it mainly serves the telemedicine scenario. However, our direction is not only heart rate monitoring but also covers more dimensions such as emotions, stress, and eye movement behavior. In terms of the ability boundary, we are extending from physiological perception to "understanding of human states".
There are also domestic teams working on rPPG, but most of the solutions are in the mode of "recording video + cloud analysis". Usually, a video of more than 30 seconds needs to be recorded first and then uploaded to the cloud for unified calculation. The whole analysis process may take dozens of seconds, making it difficult to achieve real - time response. Once the user moves, the light changes, or the posture fluctuates, the overall robustness will significantly decline.
Hard Krypton: Why can Weimian Technology make physiological perception based on rPPG more accurate?
Tang Jiankai: At the model level, we have made a lot of optimizations. The core idea is to use the "state - space model" to predict the physiological state of the human body at the next moment. A person's physiological state does not suddenly jump from a heart rate of 60 to 100. It has continuity and periodicity. Our state - space model captures this stable change pattern and combines it with the periodic fluctuation characteristics in medicine to continuously predict the current state of the human body.
In addition, data quality is also crucial. Our training data does not come from "virtual labeling" by large models but from cooperation with hospitals and collection by medical - grade devices. Currently, we have established a clinical database of tens of thousands of people, so the objectivity and accuracy of the data are higher.
In terms of emotional understanding, we also have a complete logic. For example, psychological research has proven that high HRV often corresponds to a more positive, relaxed, or more interested state; while an increased heart rate during strenuous exercise does not necessarily mean emotional fluctuations. Therefore, we not only look at the physiological indicators themselves but also combine spatial features such as actions, postures, and eye movements to understand a person's real state.
To put it simply, we are integrating the "physiological continuity in the time dimension" and the "visual perception ability in the space dimension" into a unified model, enabling AI to understand a person's physiology, emotions, and behavior simultaneously.
Hard Krypton: Why did you further develop hardware modules?
Tang Jiankai: Video is different from language. It contains a huge amount of information. If all the data is uploaded to the cloud for processing, it will not only cause high latency but also affect the real - time interaction experience. Therefore, we prefer end - side processing, allowing perception and inference to occur directly on the local device, so that the response will be more timely and the interaction will be smoother.
Another important reason is privacy. The data we process is related to physiology and emotions, which are relatively sensitive information. Especially in scenarios such as medical care and health management, users would prefer to keep the data on local devices rather than upload it all to the cloud API.
Investor's view
Shunwei Capital: The real - time physiological and emotional understanding base model developed by the company is globally unique in terms of technical route and underlying architecture. This technology can be quickly implemented in multiple scenarios such as intelligent cockpits, robots, and intelligent hardware, with broad application prospects. Shunwei highly recognizes the team's technical and productization capabilities and is willing to deeply cooperate with Weimian Technology in all scenarios of people, vehicles, and homes, accompany it in the long - term, and jointly explore the business prospects of the next - generation human - machine interaction and embodied intelligence track.