Robot enlightenment requires a kindergarten where mistakes are allowed.
In 2024, Richard Sutton, the founding father of reinforcement learning, and his mentor Andrew Barto jointly won the Turing Award.
This award didn't come early. In the past three decades, Sutton's theories have supported the evolution of systems like AlphaGo and ChatGPT. However, the theories he wrote thirty years ago have only now been truly understood by the embodied intelligence industry:
Agents should learn from trial and error and evolve from real experiences.
In 2023, Sutton co - founded the non - profit research institution Openmind. In April 2025, in the jointly published article "Welcome to the Era of Experience", Sutton once again pointed out incisively:
"The new generation of agents must have an experience stream that progresses continuously over a long - term scale, similar to that of humans, and achieve self - evolution through real physical feedback."
This time, besides theory, Sutton set his sights on a more distant future.
In May this year, Sutton officially signed a contract with Tashan Technology in Canada to jointly promote a project called "Robot Kindergarten" in the form of long - term cooperation.
A Turing Award winner and a Chinese haptic company hit it off and jointly made a prediction for the next decade of embodied intelligence: The new path for training robots may lie in real touch and trial - and - error.
Embodied intelligence lacks "first - person experience"
Ma Yang, the CEO of Tashan Technology, gave a straightforward judgment. For a robot to work, it needs to solve two problems. One is the robot's movement in the physical world, through bipedal, quadrupedal, or wheeled means, which many companies are working on.
The other is the operation of target objects, such as grasping, placing, and twisting with hands, with smooth actions and no interruption due to the deviation of the previous action. These two things together can cover about 90% - 95% of the work that humans currently need robots to do.
From the very beginning, Tashan Technology aimed to start from haptics to handle the latter task.
When Tashan Technology was founded in 2017, most robot manufacturers were working on mobile platforms, demonstrating their running, jumping, and tumbling abilities. However, more than 90% of human physical interactions are actually completed through fingers. Fingers are different from legs. They need to be in continuous contact with different target objects, involving perception, decision - making, and adjustment, which is a difficult and continuous process.
Solving the "finger position" problem in embodied intelligence, haptic perception ability is a core variable and the underlying methodology for "making robots work". Tashan Technology has been working on this for nearly a decade.
The mainstream training direction of embodied intelligence relies on end - to - end imitation on static datasets, just like using a question bank. The data demonstrated by humans is essentially second - person experience. Robots learn from human actions but cannot "touch" things themselves, so they cannot understand the operating laws of the physical world.
Tashan Technology realized early on the problems faced by this route: Just as humans need to grow from imitation and practice in their infancy, the "enlightenment" training of robots requires not only imitation but also their own first - person experience.
The training method of perceiving consequences in action and adjusting behavior in feedback may be the methodology closest to enabling embodied intelligence to perform "self - training".
This judgment coincides with Sutton's idea.
The concept of "experience stream" proposed by Sutton requires the complete integration of the learning process and the behavior process of the agent. Every action is data collection, and every feedback is a training signal. Therefore, a real environment that can provide first - person experience is the key to the implementation of this concept.
However, it has long remained at the theoretical level because the real physical environment cannot provide low - cost, high - frequency, and standardized interactive feedback. For a long time, the embodied intelligence industry has been committed to solving the problems of the brain and eyes, lacking a channel that can accurately perceive physical contact.
Haptics is the most core perception channel in physical interaction. When a robot contacts an object, the haptic sensor can provide real - time feedback on the three - dimensional force distribution at the contact point, the local deformation of the object, and the slipping trend. With this information, the robot can quickly adjust the force and angle and decide whether to tighten or relax.
With the continuous emergence of high - precision haptic perception technologies, which have filled the missing "afferent nerves" of robots, theoretical pioneers represented by Sutton have also begun to focus on this field. In November 2025, during his visit to China, Sutton actively contacted Tashan Technology, one of the two embodied intelligence companies he visited.
Sutton visits Tashan Technology
Tashan Technology is the company with the most comprehensive technical reserves in the haptic perception track.
The haptic sensor independently developed by Tashan Technology has a force resolution of 0.01N, which is "similar to the force of a hair falling on a finger". Through years of research and development in AI haptic perception technology and full - stack haptic solutions, it has overcome the global technical problem of simultaneous analysis of multi - dimensional haptic perception signals and established a complete technical system of "chip - sensor - algorithm model - scenario application".
While most haptic sensor manufacturers are still stuck in single - dimensional force measurement or simple capacitance changes, Tashan Technology has achieved simultaneous analysis of three - dimensional force, material recognition, proximity sensing, and collaborative perception.
More importantly, Tashan Technology has achieved mass production of haptic perception capabilities. In the past two years, its products have entered the commercialization stage and started mass - delivering to mainstream dexterous hand manufacturers. In 2025, Tashan Technology occupied more than 80% of the market share in the haptic sensor track for humanoid robots.
TS - VT Visual - Haptic Fusion Training Platform
After Sutton visited Tashan Technology, the two sides quickly promoted the cooperation. Besides the matching of methodologies, it was also because in the building of Tashan Technology, he saw a team that had promoted haptic perception from the laboratory to industrial implementation.
Thus, thirty years after the release of the reinforcement learning theory, theory and technology achieved a two - way connection in the field of embodied intelligence: The academic master found an ally who could engineer the theory, and Tashan Technology filled in the theoretical puzzle of haptics accelerating robot training.
Robot Kindergarten: "Enlightenment" in a real environment
The specific form of the cooperation between the two sides is the "Robot Kindergarten".
At Tashan Technology, Sutton saw Chinese primary school students taking robot classes and was amazed at the open environment for embodied enlightenment in China, where humans and robots can get along more naturally. The idea of the Robot Kindergarten was thus born.
The Robot Kindergarten is a haptic and multi - modal experience training platform for continuous learning of robots. It integrates the real physical environment, simulation environment, multiple robot bodies, haptic and multi - modal perception devices, task courses, data collection, and evaluation mechanisms, allowing robots to form trainable experiences through repeated contact, attempts, failures, and corrections.
Why is it called a kindergarten? Ma Yang said that current embodied intelligence is like a 0 - to 3 - year - old baby. We see robots doing various things in videos and think they are amazing, but in fact, the success rate is not high, and the robots themselves don't know whether they succeed or fail. "It just does things, and people will applaud."
It's actually difficult for robots to understand what they did right from human correct demonstrations. Because the concept of "correct" is very vague and covers a wide range. Only errors have boundaries. Sufficient error experiments can let a robot know where the boundaries of a task are and how to adjust in the next operation.
"The sense of security in embodied intelligence is not defined by everyone drawing a line together but is gradually explored through objective interactions."
Ma Yang firmly believes that just as human safety instincts are not only obtained by reading manuals but also grow through repeated contact, falls, and adjustments, robots are the same. Only through enough real trial - and - error can they understand what is unsafe. If a robot can define its own safe operating boundaries, it can not only protect itself but also ensure safety for others.
After Sutton visited Tashan Technology, the two sides quickly promoted the cooperation matters and completed the signing on May 11, 2026.
At the signing ceremony, Sutton talked about the significance of the cooperation: "As early as when we were graduate students, someone proposed to build a robot like a baby, let it interact with the world and grow through experience. This idea was almost impossible to achieve at that time. Now we have enough computing power and a lot of robot experience, but I think the key factor that has been missing is a clear understanding of the value of this ideal. It requires not only funds but more importantly, time and perseverance."
Sutton said that during his visit to Tashan Technology, he was pleasantly surprised to find that this Chinese company understood this point. The entire cooperation plan has a five - year cycle, with the goal of finding the most suitable learning methodology for embodied intelligence.
The signing ceremony site
Next, the "Robot Kindergarten" will build a real environment and place robot bodies in it for training. Although the initial training will be in the form of homogeneous bodies, Ma Yang believes that under the exploration of continuous learning, heterogeneous robots will not pose a major learning obstacle in the later stage. Because if an agent understands the underlying logic of a task, different body forms will not hinder learning and the transfer of experience.
In comparison, it is more important to face real environmental variables now.
Ma Yang said bluntly that the hardware in the embodied intelligence industry has reached a level of 60 points, lacking reasoning ability and continuous learning ability. Without these two abilities, it is impossible to achieve better generalization and deduction, and the entire industry will be dragged into competing for parameters and unable to find broader application spaces.
Therefore, early learning must continuously interact with the real environment. The training environment built cannot deliberately avoid variables and unfavorable factors in real scenarios. Otherwise, the experience ceiling that robots can learn is very low, and it is difficult to make further progress.
The cooperation between Tashan Technology and Sutton is also to find a new path. "In this matter, there is no high - tech, only the choice of methodology."
The prerequisite for commercialization is the ability to "learn while working"
Ultimately, the methodology needs to be tested in application scenarios. Ma Yang also has a very practical judgment on commercial implementation: In the next three to five years, the scenarios where embodied intelligence is most likely to enter first will not be those with high logicality and high - time - sensitive requirements.
It is more suitable to replace a specific type of work: work that people don't want to do and has a relatively high tolerance for errors.
This type of work has three characteristics: The tasks are repetitive but not completely fixed on an assembly line; the success rate requirement is very high. One failure may directly interrupt the process and require strong manual intervention; the time - sensitive requirement for a single task is relatively loose and does not require second - level response.
Ma Yang gave several examples: One is the service industry scenario, such as dishwashers in North American restaurants. Their work is to rinse the dishes and put them into the dishwasher. The action is simple but boring and onerous. Currently, there are millions of people in this position in the United States. If robots can achieve a high enough success rate for this action, it can release huge commercial value. At the same time, the dish - washing task does not have high time - sensitive requirements and can be completed overnight. However, the success rate requirement is high. If a bowl is broken, the entire process has to stop.
There is a more specific case in the agricultural processing field. In the crayfish processing factories in Qianjiang, the step of "removing the heads of crayfish" has always been done manually. Because the sizes of crayfish are different, and the hardness of their shells changes with the seasons, it requires high - level haptic perception technology for the equipment. A factory spends tens of millions of yuan on labor costs for this process every year. During peak periods, one or two thousand people work on the production line.
Tashan Technology spent half a year first on imitation learning and simulation training, and then let the robot repeatedly practice autonomously on the real production line using reinforcement learning. Finally, the success rate of shelling crayfish was increased to over 95%. While efficiently removing the heads of crayfish, the roe was completely retained, improving the product value structure. Currently, Tashan Technology's intelligent crayfish - shelling equipment has reached a cooperation with leading crayfish processing enterprises, with an initial signing of 100 units.
Tashan Technology's intelligent crayfish - shelling equipment
The logic of choosing these scenarios is very clear. Robots cannot compete with humans in reasoning speed at present, but they are very suitable to fill the gaps that automation cannot handle and humans are reluctant to do. Haptic perception is the key to unlocking these scenarios. Because it provides real - time feedback, robots can flexibly adjust the force and angle during the execution process without a perfectly preset trajectory.
If most of the industry's efforts are focused on training robots to imitate humans, then the "ceiling of embodied intelligence is humans themselves". To break through this ceiling, the entire industry needs to explore together.
Ma Yang has always emphasized that compared with the barriers of Tashan Technology itself, he hopes to see more peers join in and jointly promote the development in the right direction. Tashan Technology and Sutton hope to establish an open and shared R & D infrastructure to attract the global academic and industrial circles to jointly explore the methodology of continuous learning in embodied intelligence.
At this stage, Tashan Technology and Sutton, as initiators, will focus on building the platform. In the future, the entire system will gradually be opened to the industry. The upstream and downstream of Tashan Technology's industrial chain, global universities, and research institutions may all become ecological partners in this cooperation project.
The combination of haptic perception and continuous learning is paving the way for the next decade of embodied intelligence.
Sutton's answer is already written in the concept of a real experience stream. And Tashan Technology is about to turn this answer into an executable engineering plan with a Robot Kindergarten, allowing embodied intelligence to learn to grow from "mistakes" in the real physical world.