HomeArticle

The creator of Apple's Face ID is building an end-to-end perception system for "physical AI" and has raised $107 million in financing.

阿尔法公社2026-01-14 18:37
"Physical AI" is still in its early stages, and there are still many opportunities.

In the past year, "Physical AI" has gradually become the industry's consensus as the next major development direction for AI. Previously, we believed that "Physical AI" faced challenges such as a lack of embodied intelligence data and immature world models. These two issues are related to the intelligence of "Physical AI." In fact, even in the perception aspect, which the industry considers relatively mature, it is still far from being truly mature.

Previously, perception ability was often regarded as a component issue rather than a systematic one. Hardware teams spent years building the perception technology stack from scratch, including purchasing sensors from multiple suppliers, calibrating systems, and debugging synchronization issues, constantly reinventing the wheel.

Actually, systematic perception ability has already been overcome and applied on a large scale in the consumer electronics field. For example, Microsoft's Kinect and Apple's FaceID. Now, the creators of these two technologies have formed a team and founded the startup Lyte.

They integrate advanced 4D sensing, RGB imaging, and motion perception capabilities into a single platform. Through a single connection, it can provide unified spatial and visual data, enabling direct communication between the "eyes" and the "brain" and building the perception infrastructure lacking in the industry.

Recently, Lyte received $107 million in early - stage financing. Avigdor Willenz, Fidelity, Atreides Management, Exor Ventures, Key1 Capital, and Venture Tech Alliance participated in the investment.

The Creators of Microsoft Kinect and Apple FaceID Build an End - to - End Perception System for AI

Lyte was co - founded by Alexander Shpunt (CEO), the key architect of Apple's depth sensing and perception technology, Arman Hajati, and Yuval Gerson.

The founding team of Lyte

Alexander Shpunt co - founded the 3D sensing company PrimeSense and served as its chief technology officer. Since 2005, he has been working on a problem: how to teach machines to perceive depth? He wants machines to perceive space like humans - seeing not flat pixels, but dimensions, distances, and the relationships between objects in three - dimensional space.

To this end, he and his team created the "light coding technology": an infrared projector projects an invisible dot matrix onto the entire scene; a camera reads how these dots are distorted at different distances and converts them into a real - time depth map through triangulation.

Five years later, this technology gave birth to a revolutionary somatosensory device - Microsoft Kinect, which sold eight million units in sixty days.

In 2013, his company was acquired by Apple. He and his team members joined Apple and continued to evolve this core technology. In 2017, Apple's FaceID was launched and is now used in billions of devices.

In 2021, Alexander Shpunt saw the early trend of "Physical AI." AI is not only used to read text and recognize images but also to navigate in warehouses, operate machinery, and share roads with pedestrians and vehicles - AI is moving towards embodiment and integrating into the real world.

However, risks also emerge. Regarding perception, compared to occasional instability on smartphones, when it is used in warehouses or on open roads, the consequences of errors can be catastrophic.

Alexander Shpunt believes that an important factor for the smooth development of "Physical AI" is the ability to have a reliable understanding of the physical world. Robots must be able to operate safely in complex and dynamic environments, not just in controlled ones.

He formed a team based on his former colleagues at Apple. This team spans the fields of sensing, chips, and physical AI. Besides Alexander Shpunt himself, Arman Hajati (CTO) led the architecture design of multiple generations of iPhone and Apple Watch Taptic Engines; Yuval Gerson (Vice President of Engineering) focuses on complex mechanical and micro - electro - mechanical systems (MEMS), and Reza Nasiri Mahalati (Hardware Lead) has rich experience in integrating advanced sensing modules at the hardware, software, and algorithm levels.

Adding the Fourth Dimension Missing in Structured Light: Speed

According to Grand View Research's prediction, the market size of AI robots will reach $125 billion by 2030. However, McKinsey's data shows that more than 60% of industrial enterprises lack the internal ability to implement robot automation independently, including sensor integration capabilities.

The traditional solution for enterprises is to piece together perception systems from multiple suppliers and then spend months calibrating sensors, writing fusion software, and debugging integration failures.

Lyte aims to solve this structural problem. With the idea of vertically integrating the technology stack, they integrate sensing hardware, custom chips, and perception software into a single platform, providing a clear and reliable perception layer for autonomous machines (including but not limited to embodied intelligent robots) to operate in the real world.

Structured light (a more general term for light coding) is an important perception technology that has been successfully applied in indoor spaces and face recognition. However, structured light has its limitations: it is only effective at close range and can only capture where an object is, not where it is going.

For machines moving in the world, this is far from enough. A robot navigating in a warehouse needs to know not only where a forklift is but also that the forklift is approaching it at a speed of four meters per second. A delivery robot moving on the sidewalk needs to not only see a child but also see that the child is running.

Traditional sensors capture position. To understand motion, software needs to compare different frames of images: the current position and the previous position. This introduces a delay. When operating in a dynamic world, delay is the root of risk.

The somatosensory devices developed by the Lyte team can not only see people's body positions but also track their movement patterns. Facial recognition technology not only maps people's faces but also confirms that the face is alive, present, and real. Both of these technologies can understand dynamically evolving scenarios, not just static moments.

Now, the Lyte team applies the same capabilities to longer distances, higher speeds, and machines operating in open spaces.

They developed a new core technology - "coherent vision." Instead of projecting patterns and reading distortions when using light, this technology emits continuous signals and measures their return signals. Position and speed are captured simultaneously in an instant, without the need for post - calculation. This ability is built into the measurement itself.

A different way of using light. Instead of projecting patterns and reading distortions, it emits continuous signals and measures their return signals. Position and speed are captured simultaneously in an instant, without the need for post - calculation. This ability is built into the measurement itself. It introduces a fourth dimension to perception, which is speed. The position and movement direction of an object are known simultaneously.

Put simply, in the past, algorithms were needed to supplement speed information, so there was a delay. However, "coherent vision" directly obtains speed information at the physical layer, so there is no delay.

LyteVision: Enabling Direct Communication between the Machine's "Eyes" and "Brain"

Based on this core technology, the Lyte team built the LyteGalaxy, a unified spatial intelligence platform that integrates sensors, computing units, software, and algorithms, constructing a complete and unified perception technology stack for robots.

LyteVision

In terms of perception, their core hardware product is LyteVision, an end - to - end perception system that won the "Best Innovation Award" in robotics at the 2026 CES. This new product integrates advanced 4D sensing, RGB imaging, and motion perception capabilities into a single platform, providing unified spatial and visual data through a single connection.

It not only unifies sensors but also unifies the entire path from perception to intelligence: sensors are integrated with chips, chips are customized for software, and software is designed for AI computing. From the moment photons hit the sensor to the moment a decision is returned to the machine, the entire technology stack - a single architecture with seamless connection.

Specifically, it goes through three integration stages to transform raw sensor data into executable intelligent information.

Step 1: Perception. It integrates 4D coherent vision, RGB (visible light), and IMU (inertial measurement unit) in a plug - and - play, task - ready module. This module has complete and compact sensing functions, is deployed with a single cable, and is an out - of - the - box perception system.

Step 2: Fusion and processing. The custom chip of this system processes multi - sensor fusion instantaneously at the hardware level, providing unified and time - synchronized perception data. It allows developers to focus on building robot behaviors, eliminating the trouble of debugging sensor synchronization.

Step 3: Understanding. Just as the eyes are connected to the brain through the nervous system, a robot that can see the world still needs to understand what it sees. This means connecting sensors, chips, software, and artificial intelligence computing. Then, data flows from the edge to the cloud and back. The model processes the information perceived by the machine, makes decisions, and issues instructions. And this entire closed - loop is completed in milliseconds, ensuring immediacy.

The final result is that LyteVision, an independent module with only a single connector, can unify the output of multiple sensors; it can capture position and speed in real - time and be deployed immediately. Due to the unity of the module, it allows every machine using it to share a consistent "view" of the world.

Perceiving the physical world is complex, but Lyte internalizes these complexities and provides a perception layer connected to the intelligent layer. It enables direct communication between the "eyes" and the "brain."

This allows "Physical AI" to no longer face development limitations in perception.

In terms of adaptability, LyteVision can empower a wide range of physical AI platforms, including autonomous mobile robots, robotic arms, quadruped robots, autonomous taxis, and humanoid robots.

"Physical AI" is Still in Its Early Stages, with Many Opportunities Ahead

With technological progress, AI is entering more and more scenarios. With the rise of Physical AI, the demand for perception in AI is evolving from static and single - dimensional to general and real - time.

In the past, AI only needed to recognize a static human face. Now, it needs to move freely in an open and complex physical environment, where "accidents" outside the training data may occur at any time.

The Lyte team has been involved in machine "perception" since 2005. Moving forward in this main direction, they have created new technological solutions in response to the new trends and demands of "Physical AI."

End - to - end is the next trend for perception systems, and Tesla is a good example. It doesn't have a complex hardware system combining radars and cameras but uses a pure camera solution. However, the massive data collected by the front - end cameras is combined with the back - end deep - learning model, forming a data flywheel. This allows it to become stronger without increasing hardware complexity.

Lyte's products are also end - to - end. Through vertical integration technology combining hardware and software, it internalizes the complexity of the perception system (hardware + software) and provides customers with a simple product. Moreover, this product is versatile enough to adapt to various hardware forms and application scenarios of customers.

Currently, "Physical AI" is still in its early stages. As we summarized in a previous article, it has issues such as an AI operating system for empowering intelligent hardware, a lack of embodied intelligence data hindering the development of world models, and imperfect "world models." However, looking at the industry more closely, there are actually more than these three issues. For example, the perception aspect, which we thought was already mature, is being revolutionized by Lyte.

Therefore, whether it's in intelligence, perception, or action control, and whether at the hardware or software level, there are still many opportunities for breakthroughs in "Physical AI," which are worth exploring for entrepreneurs.

This article is from the WeChat official account “Alpha Commune” (ID: alphastartups), author: One who discovers extraordinary entrepreneurs. It is published by 36Kr with authorization.