Google's Coral NPU Open-Source Full-Stack Platform Enables 24/7 Running of Large Models on Smartwatches

Aiming to address the three core challenges of edge AI, namely performance, fragmentation, and privacy, Synaptics has taken the lead in adopting...

Today, Google is quite busy.

On the one hand, in collaboration with Yale University, they used Cell2Sentence-Scale 27B (C2S-Scale) developed based on Gemma to predict a new potential cancer treatment for the first time, which has attracted wide attention worldwide. On the other hand, they have updated and launched Veo 3.1, bringing users significantly improved video generation capabilities. Refer to the report "Just now, Google Veo 3.1 has undergone a major update, going head - to - head with Sora 2".

Moreover, they have introduced Coral NPU, which can be used to build AI that runs continuously on low - power devices. Specifically, it can run small Transformer models and LLMs on wearable devices and support TensorFlow, JAX, and PyTorch through the IREE and TFLM compilers.

Like the previous two news items, this has also sparked extensive discussions among developers.

Coral NPU: A Full - Stack Open - Source AI Platform for Edge Devices

Google positions Coral NPU as "a full - stack, open - source platform designed to address the three core challenges of performance, fragmentation, and privacy, which limit the application of powerful, always - on AI technologies on low - power edge devices and wearable devices."

That is to say, with Coral NPU, in the future, we can expect to build useful AI that can run continuously locally on devices such as smartwatches, embedding intelligence directly into the user's personal environment.

However, achieving this is not easy. Google has summarized three major challenges:

Performance Gap: Complex and advanced machine learning models require more computing resources, far exceeding the limited power, heat dissipation, and memory budgets of edge devices.

Fragmentation Costs: Compiling and optimizing machine learning models for diverse proprietary processors is both difficult and expensive, which hinders the achievement of consistent performance across devices.

Lack of User Trust: For personal AI to truly be effective, it must prioritize the privacy and security of personal data and context.

The Coral NPU launched by Google today is based on its original Coral project. "It provides hardware designers and machine learning developers with the tools needed to build the next - generation private and efficient edge AI devices."

Specifically, Coral NPU is the result of a collaborative design between Google Research and Google DeepMind. It is an AI - first hardware architecture that can support the next - generation ultra - low - power, always - on edge AI.

It offers a unified developer experience, making it easier to deploy applications such as context awareness. It is designed for all - day AI on wearable devices, minimizing battery consumption while being configurable for higher - performance application scenarios.

Google has released relevant documentation and tools so that developers and designers can start building immediately.

Project homepage: https://developers.google.com/coral

Code repository: https://github.com/google - coral/coralnpu

Technical Details

As the name suggests, Coral NPU adopts the NPU (neural processing unit) architecture, which provides building blocks for the next - generation energy - efficient, machine - learning - optimized system - on - a - chip (SoC).

This architecture is based on a set of IP modules compliant with the RISC - V instruction set architecture (RISC - V ISA), designed for the lowest power consumption, making it ideal for always - on context awareness.

Its basic design can provide performance at the level of 512 GOPS (billion operations per second) while consuming only a few milliwatts of power, bringing powerful edge - side AI capabilities to edge devices, ear - worn devices, AR glasses, and smartwatches.

A unified view of the Coral NPU ecosystem, showing the end - to - end technology stack provided for SoC designers and machine learning developers.

This open and scalable RISC - V - based architecture provides flexibility for SoC designers, allowing them to modify the basic design or use it as a pre - configured NPU.

The Coral NPU architecture consists of the following components:

A scalar core: A lightweight, C - programmable RISC - V front - end responsible for managing the data flow to the back - end cores. It uses a simple "run - to - completion" model for ultra - low power consumption and traditional CPU functions.

A vector execution unit: A powerful single - instruction multiple - data (SIMD) coprocessor compliant with the RISC - V vector instruction set (RVV) v1.0 specification, capable of performing synchronous operations on large data sets.

A matrix execution unit: An efficient quantized outer - product multiply - accumulate (MAC) engine built specifically to accelerate the basic operations of neural networks. Note that this matrix execution unit is still under development and will be released on GitHub later this year.

A schematic diagram of the architectural transition from traditional design to Coral NPU.

Unified Developer Experience

The Coral NPU architecture is a simple, C - programmable target platform that can be seamlessly integrated with modern compilers such as IREE and TFLM. This enables it to easily support machine learning frameworks such as TensorFlow, JAX, and PyTorch.

Coral NPU includes a comprehensive software toolchain, including dedicated solutions such as the TFLM compiler for TensorFlow, as well as a general - purpose MLIR compiler, a C compiler, custom kernels, and a simulator. This provides developers with flexible paths.

For example, a model from a framework such as JAX is first imported in MLIR format using the StableHLO dialect. This intermediate file is then fed into the IREE compiler, which applies a hardware - specific plugin to identify the Coral NPU architecture. Subsequently, the compiler performs progressive lowering - a key optimization step in which the code is systematically translated through a series of dialects, gradually approaching the machine's native language. After optimization, the toolchain generates a final, compact binary file for efficient execution on edge devices.

The following table shows the software development advantages of Coral NPU:

This set of industry - standard developer tools helps simplify the programming of machine learning models and provides a consistent experience across various hardware targets.

The Coral NPU compiler toolchain, showing the complete process from machine learning model creation, optimization, compilation to device - side deployment.

The co - design process of Coral NPU focuses on two key areas.

First, the architecture can efficiently accelerate the leading encoder - based architectures in today's device - side vision and audio applications.
Second, Google is closely collaborating with the Gemma team to optimize Coral NPU for small Transformer models, ensuring that this accelerator architecture can support the next - generation edge generative AI.

This dual focus means that Coral NPU is expected to be the first open, standards - based, low - power NPU designed to bring large language models (LLMs) to wearable devices.

For developers, this provides a single and proven path to deploy current and future models with the lowest power consumption and the highest performance.

Target Applications

Coral NPU is designed to support ultra - low - power, always - on edge AI applications, with a particular focus on context - awareness systems. Its main goal is to provide an all - day AI experience on wearable devices, mobile phones, and Internet of Things (IoT) devices while minimizing battery consumption.

Potential use cases include:

Context Awareness: Detect user activities (such as walking, running), distance, or environment (such as indoor/outdoor, on the move) to enable the "do not disturb" mode or other context - aware functions.
Audio Processing: Voice and sound detection, keyword recognition, real - time translation, transcription, and audio - based accessibility features.
Image Processing: Person and object detection, face recognition, gesture recognition, and low - power visual search.
User Interaction: Device control through gestures, audio prompts, or other sensor - driven inputs.

Hardware - Enforced Privacy Protection

A core principle of Coral NPU is to build user trust through hardware - enforced security.

Google says, "Our architecture is being designed to support emerging technologies such as CHERI, which provides fine - grained memory - level security and scalable software partitioning. We hope that through this approach, we can isolate sensitive AI models and personal data in a hardware - enforced sandbox to defend against memory - based attacks."

Building an Ecosystem

The success of an open - source hardware project depends on strong partnerships.

To this end, Google has announced a partnership with Synaptics, which is also its "first strategic chip partner" and a leader in embedded computing, wireless connectivity, and multimodal sensing in the IoT field.

Today, Synaptics announced its new Astra SL2610 series of AI - native IoT processors at its Technology Day event. This product line uses their Torq NPU subsystem, which is the industry's first mass - produced implementation of the Coral NPU architecture. This NPU is designed to support Transformers and dynamic operators, enabling developers to build future - proof edge AI systems for consumer and industrial IoT.

Astra SL2610, from X user @TekStrategist

Conclusion

Google says that Coral NPU is expected to "solve the core crisis of edge computing": "With Coral NPU, we are building a foundation layer for the future of personal AI. Our goal is to foster a vibrant ecosystem by providing a common, open - source, and secure platform for the industry to build upon."

What do you think of this? Are you interested in trying to develop based on this platform?

Reference Links

https://x.com/GoogleResearch/status/1978449643437539378

https://research.google/blog/coral - npu - a - full - stack - platform - for - edge - ai

This article is from the WeChat official account "Machine Intelligence". Editor: Panda. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。