HomeArticle

Exclusive Interview with IntBot: Going Global from Silicon Valley, Empowering Robots with a "Soul" Through World Model-Driven Social Interaction

具身研习社2026-06-23 15:32
see he remembers me

Social intelligence has completely rewritten the value calculation method of embodied intelligence. In the past, the commercial value of service robots has always been anchored in the "cost reduction" logic, and its conversion formula has always revolved around labor costs. However, social intelligence creates space for "revenue increase" and "premium".

There is only one main line for the evolution of embodied intelligence: deconstruction and then reconstruction.

Use AI to deconstruct human behavioral intentions and operational abilities, and then reconstruct the closed - loop from perception to decision - making of embodied intelligence. Along this main line, mainstream players are more inclined to shape the working ability of embodied intelligence. They use more precise hardware, more advanced algorithms, and more differentiated scenarios to replicate human physical strength and skills, and create new productivity in the physical world.

However, what is worth deconstructing about humans is far more than operational abilities.

This is also a less - traveled path, but it is showing commercial imagination: deconstructing human social abilities. This is not about attaching an emotional interaction module to an existing model, nor is it a plug - in in the MOE architecture. Instead, it is about building a complete social foundation from the bottom up to "soul" a productive worker.

This is a new question about the definition boundary of embodied intelligence.

Its answer is far more than making robots "eloquent", nor is it about having a heartfelt conversation with you in a certain scenario. Instead, it is about enabling robots to change from passively responding to and executing instructions to actively approaching and executing in advance. Moreover, the social foundation model can cross entities and scenarios. This ability is homologous to human social intuition and runs through all interactions and all tasks of different people.

Only by clarifying the real value of this main line can we understand the real weight of social intelligence. It is not an added - value function of embodied intelligence, but an independent track with a very high commercial ceiling.

Yang Lei, CEO of IntBot, and Sharon Yang, CTO of IntBot, clearly outlined the commercial imagination of social intelligence when talking with the Embodied Intelligence Research Society. The scope of social intelligence includes, but is not limited to, entity manufacturers, terminal scenario providers, and solution providers. From the very beginning, it is not attached to a certain piece of hardware or a certain scenario, but a set of underlying infrastructure that is independent of operational abilities and can be reused horizontally. This is exactly the underlying logic of IntBot's insistence on "fully open and non - binding". The hardware forms of robots are diverse, and application scenarios will eventually be fragmented. A social foundation that can be reused across entities and industries can penetrate all links of the industrial chain and become a new value anchor.

The deconstruction of social, emotional, and interpersonal logic is essentially injecting "perception and propriety" into embodied intelligence. This is not a departure from the main line of productivity, but an inevitable extension of AI's replication of human abilities. The reason why humans can complete complex collaborations, establish business trust, and form emotional connections has never relied solely on precise actions and clear instructions.

Today, most resources and attention in embodied intelligence are still focused on "making robots more capable". These hard indicators define the industry's access threshold and also form the current commercial foundation. However, when the marginal benefit of operational ability begins to decline, the core variable that can truly widen the product price difference, build user stickiness, and open up the household and mass - consumer markets is precisely this hard - to - quantify ability of "understanding people".

This may be the most core industrial value of this less - traveled path of social intelligence: it does not rewrite the evolution main line of embodied intelligence, but it does broaden the boundary of this main line. Before this, when we discussed embodied intelligence, we always asked "how much labor value can it generate"; after this, we may need to start answering another more fundamental question: to what extent can it understand, connect with, and approach people?

World Model Foundation: Know - how is the Deepest Moat

"If the ability of the general foundation model makes a leap, will it overtake the social intelligence base model?"

"No."

Sharon Yang's answer was straightforward. Behind this certainty is a clear understanding of the technical barriers of social intelligence. It is not a simple extension of the general large - model ability in social scenarios, but a complete technical system built on the world model.

IntBot constructs social intelligence as a social world model facing the real human environment and achieves a complete closed - loop from understanding people to performing actions that meet human expectations through three - layer capabilities.

The first layer is the social perception layer: the system inputs human language, micro - expressions, body movements, and context information such as the environment, scenario, and interpersonal relationships, and outputs a prediction of a person's current state and potential demands, mainly answering "what does the person in front need at this moment".

The second layer is the social reasoning layer: based on the judgment of the perception layer, combined with scenario rules and social logic, it completes decision - making derivation. For example, "customers need ice water on a hot summer day", transforming vague perception into a clear action goal.

The third layer is the behavior specification layer: it disassembles the reasoning result into an executable multi - modal action sequence, coordinates the robot's motion system, voice system, and expression system, and finally outputs a complete interactive behavior. It is not only about handing over the required items but also accompanied by appropriate greetings and body movements, forming a complete feedback that conforms to social propriety.

To put it simply, this is an end - to - end social closed - loop: from the multi - modal perception input of the environment and people to understand the social environment, to the step - by - step reasoning of social logic, and finally to output a whole - body - level interactive result that coordinates language, actions, and expressions, rather than a single text or voice response, and act in accordance with human expectations based on this.

The most hidden and solid technical barrier of this architecture lies in the pre - prediction ability of the perception layer. It does not need to wait for humans to clearly issue instructions in language. Only through non - verbal signals such as micro - expressions and body postures, combined with the scenario context, it can predict the demands that people have not yet spoken. The most typical scenario is a hotel lobby. On a rainy day, a customer walks in quickly, soaking wet. With the support of the model ability, the robot does not need to step forward and ask "May I help you?" Instead, it will directly fetch a dry towel and warm water and hand them to the customer. There is no instruction - triggering link in the whole interaction, but it is closer to the real service experience than any standard Q&A, just like a familiar friend or family member who can always sense your needs before you speak.

This is exactly the ability that is difficult to emerge naturally in the general foundation model.

In real social interactions, most signals are non - verbal and implicit. There is no standard correct answer, only a difference in propriety between "appropriate" and "inappropriate". This ability cannot be naturally obtained through the large - scale stacking of general corpora. It requires a special interactive data set, special training goals, a special social evaluation system, and a world model architecture optimized specifically for physical - world entity interactions.

This is the moat built by IntBot with know - how.

Yang Lei said that there are many partners with a cognitive science background in the IntBot team at present, and a large amount of human behavioral psychology has been added during model training to strengthen the robot's perception ability and behavior logic interpretation. Previously, a well - known American university jointly carried out a joint research with IntBot, and won the Best Paper with the research results on IntBot's social robot Nylo. An interesting thing here is that IntBot's deconstruction of human emotional logic is not only limited to the external manifestation of what feedback the robot should give but also designs the robot's expressions, appearance, etc. to endow the robot with self - cognitive ability in response to common human psychological cognitive problems.

Sharon Yang further added that the data required for social intelligence is fundamentally different from the traditional robot training data. Compared with a highly standardized data collection environment, IntBot pays more attention to multi - source data that can reflect real human behavior patterns, including Internet videos, simulation environments, and real - world human - robot interaction data. These data can help the model learn human expressions, postures, sense of distance, attention, and social interaction rules in the real environment. As robots are continuously deployed in real scenarios such as hotels, airports, and exhibitions, the system continuously accumulates real human - robot interaction data and continuously feeds it back to model training, gradually forming a positive cycle of "more deployments - more interactions - the model understands people better - more deployments". This accumulation of real - world human - robot interaction data is also becoming one of the most important long - term barriers for IntBot's social intelligence platform.

Putting IntBot's social intelligence path in the context of the larger - scale implementation of embodied intelligence, when all robots can understand instructions and complete basic tasks, the watershed of product experience will shift from "can it do it" to "does it do it comfortably, does it understand propriety, and can it make people trust it". The former is the industry's lower limit that can be covered by general abilities, while the latter is the upper limit of experience defined by social intelligence.

Cross - entity, Cross - scenario, and Unrestricted Business

In the current context of the embodied intelligence industry, the fact that robots have social abilities is easily regarded as a non - core "icing on the cake" function. The emotional interactive value is difficult to be incorporated into a rigorous business model.

IntBot felt this cognitive gap particularly clearly during the financing process. Yang Lei admitted that many investors would narrow social intelligence down to emotional companionship, pet - style interaction, or even just a more expressive voice synthesis. In the mainstream view, this is an added - value module attached to the hardware entity and cannot support an independent track.

However, this is exactly the most core misunderstanding of social intelligence. Before the exclusive interview, at the Beyond Expo forum in Macao, Yang Lei said during a round - table discussion with the Embodied Intelligence Research Society that social intelligence is not a single - dimensional interactive function, nor is it limited to a certain type of scenario or a certain hardware form. Instead, it is a set of underlying foundation capabilities that can be reused horizontally.

IntBot's social foundation can be deployed in parallel with various VLA models to jointly form the complete brain of the robot. If the operational model endows the robot with a "working" body and builds a productivity base in the physical world, the social foundation is to inject a "people - understanding and empathetic" soul into this body, enabling the cold industrial product to have anthropomorphic interactive propriety and truly integrate into human life and work scenarios as a collaborative partner.

Take a real - life example. A robot equipped with the IntBot model will actively greet and communicate with strangers when walking in the bustling Times Square. Even when surrounded by a crowd, the robot can still accurately identify the person it is talking to through visual and voice information, achieving a natural one - on - one conversation without being disturbed by the surrounding crowd and noise. After a short conversation and farewell, when the robot meets the stranger it just talked to again after strolling around for a while, it still remembers who the person is and what they talked about. The stranger's words "see he remembers me" are the best annotation of social intelligence.

And this ability, combined with its positioning as an underlying foundation, determines IntBot's business ceiling from the very beginning.

On the hardware side, they adhere to an open and horizontally compatible route. In IntBot's view, the robot industry is unlikely to converge into a few standardized forms like the automotive industry. The demands of the household, industrial, and service sectors are naturally very different, which will inevitably give rise to a variety of hardware products. This means that the optimal strategy at the brain level is not to bet on a single entity manufacturer but to widely connect with leading players in various industries to cover as many product forms and application scenarios as possible. Currently, IntBot has reached cooperation with several entity manufacturers with leading shipment volumes and different technical routes.

On the scenario side, IntBot also follows the idea of "having a focus but not being bound". In their view, social intelligence is a general underlying ability, and its penetration boundary can cover all industries. Binding to a single scenario too early will instead limit its own business ceiling. At present, the focus of implementation is on the service industry, giving priority to entering multiple high - potential niche scenarios such as hotels, airports, retail, medical care, and elderly care companionship.

Currently, the partners on the scenario side are mainly divided into two categories, each corresponding to a different complementary logic: The first category is vertical industry solution providers and scenario operators. Take two hotel customers that IntBot cooperated with in the early stage as an example. They not only operate offline hotel properties but also output complete hotel intelligent solutions externally. As the core component of the overall solution, the social intelligence brain expands into a broader market along with the partner's channel system. The second category is distributors and dealers in the robot industry. These players originally had mature customer - reaching and offline service capabilities, but their past customer groups were mostly concentrated in the fields of education, scientific research, and cultural and entertainment performances. They lacked the continuous AI iteration ability and remote operation and maintenance system required to support 24 - hour unmanned services. Social intelligence exactly fills this core shortcoming, helping channel partners break out of the red ocean of hardware price wars and enter the more valuable commercial service market.

This "fully open and non - binding" strategy is essentially betting on the final pattern of the embodied intelligence industry: when the hardware forms are destined to be fragmented and there is no ultimate product that can cover all scenarios, a set of core capabilities that can be reused across forms and scenarios is likely to increase its value node in the industrial chain.

This is very similar to the operating system logic in the PC era. Hardware manufacturers are diverse and constantly iterating, but the value of the underlying system will increase exponentially as the ecological scale expands. The business flywheel of the social intelligence foundation is the same: the more entities are connected and the wider the scenarios are covered, the richer the real human - robot interaction data will be accumulated, the stronger the model's social understanding and prediction ability will be, and in turn, it will attract more partners to join, forming a positive scale effect.

More importantly, social intelligence has completely rewritten the value calculation method of embodied intelligence. In the past, the commercial value of service robots has always been anchored in the "cost reduction" logic, and its conversion formula has always revolved around labor costs. However, social intelligence creates space for "revenue increase" and "premium". Currently, with the support of the IntBot model, many people will stop in front of the robot and communicate for up to fifteen minutes, changing the previous fragmented question - and - answer communication between humans and robots. This communication and perception and recognition ability enable the more considerate service robot in the hotel to improve customer reviews and repurchase rates, the more understanding companion robot in the elderly care scenario to increase the willingness to pay, and the more observant shopping - guide robot in the retail scenario to improve conversion efficiency. These values have no clear human - power equivalent, and their commercial flexibility is much higher than simple human - power substitution.

From this perspective, what IntBot wants to do is to be an interactive infrastructure provider in the era of embodied intelligence and has quietly built its own business moat. This is not only about the scale of a single company but also about the reconstruction of the entire human - robot interaction paradigm.

From Silicon Valley to the World

Founded in Sunnyvale, California, USA in 2024, IntBot has carried a distinct technical background from the very first day of its birth.

As the co - founder and CEO, Yang Lei's career path itself implies IntBot's technical gene. He graduated from Tsinghua University and holds a Ph.D. in Computer Science from the University of California, Santa Barbara. He once served as the General Manager of the AIoT Division at Ant Group under Alibaba. Earlier, he led multiple cutting - edge research and product implementation projects at the Intel Labs. He has published more than 30 academic papers and holds more than 30 US patents. His past diverse development experiences have given him a more concrete judgment on "how robots can truly integrate into the physical world and interact with humans" than entrepreneurs with a pure large - model background. This is also the fundamental reason why he is certain that social intelligence is the next core variable in embodied intelligence.

Sharon Yang, the co - founder and CTO, holds a Ph.D. in Computer Engineering from the University of Illinois at Urbana - Champaign (UIUC). She was a senior chief engineer at Intel, leading global interdisciplinary engineering projects and enterprise - level innovation businesses, and promoting the implementation of multiple technologies in Intel's entire product line and partner ecosystem. Her research fields cover edge AI, AI systems, robots, drones, and advanced wireless communications. She has 161 authorized patents globally, has published 37 peer - reviewed academic papers, and is also deeply involved in industry standard setting.