All Major MCU Players Are Now in the Open

It is worth mentioning that although several giants have reached an agreement on the main line of integrating NPUs, they have their own focuses in specific implementation and application.

For a long time, the world of embedded computing has been stable and restrained.

The core mission of MCUs is a single - minded one: reliability, real - time performance, and low power consumption. There's no need for performance to double year after year, nor does the architecture pursue radical innovation. With Flash, SRAM, a CPU core, and a well - established software toolchain, it's more than sufficient to support the stable operation of industrial control, automotive electronics, and various terminal devices for over a decade.

However, in the past two or three years, this order has been quietly disrupted.

The change didn't start with "computing power anxiety." Different from the world of servers and GPUs, MCUs don't crave higher TOPS, nor do they need to run large models with tens of billions of parameters.

In fact, the real pressure comes from edge devices, which are being assigned more and more tasks of "understanding the environment and making judgments": sensor data fusion, anomaly detection, image recognition, voice wake - up, and predictive maintenance. These capabilities don't demand extreme performance, but they do pose unprecedented requirements for real - time performance, power consumption controllability, and system determinacy.

In the eyes of the major MCU giants - TI, Infineon, NXP, ST, and Renesas - the industry is undergoing a revolution. AI is no longer just software running on MCUs; it's starting to reshape the very architecture of MCUs. Not only is the production process moving from the traditional 40nm to 22nm, 16nm, or even more advanced nodes, but multiple modules including NPUs are being integrated. At the same time, new types of memories are stepping into the spotlight from behind the scenes.

So far, the development of MCUs has forged a brand - new path. What the market truly needs isn't simply a "faster" MCU, but a brand - new architecture that natively supports AI workloads while upholding traditional advantages.

Why Integrate an NPU?

I believe many people have a question: Why do even MCUs need to integrate an NPU module?

Actually, the logic behind this round of NPU integration in MCUs is completely different from that in mobile phones and servers. In the mobile and data - center domains, the goal of NPUs is to pursue higher TOPS values, faster inference speeds, and support for more complex models. However, in the embedded field, NPUs are more about ensuring the stability of the entire system's operation.

Current industrial and automotive scenarios are essentially real - time control systems. In applications such as motor control, power management, and ADAS decision - making, the system must respond within a fixed time window ranging from a few microseconds to a few milliseconds. Under the traditional architecture, if the CPU is tasked with both control and AI inference, a fatal problem arises: AI inference tasks will consume CPU resources, causing delays in control interrupts and disrupting the system's time determinacy.

The value of an NPU lies in achieving "computing power isolation." It separates AI inference from the main control path, allowing the CPU to focus on deterministic tasks while AI inference runs on an independent hardware unit. This solves a key contradiction in embedded AI: the need for intelligence without sacrificing real - time performance.

Another key constraint in embedded systems is the power consumption budget. Industrial IoT devices often need to operate on battery power for several years, and automotive chips need to work in a temperature range from - 40°C to 150°C. Any power consumption fluctuations could lead to system overheating or premature battery depletion. A dedicated NPU, with its fixed MAC array and systolic array architecture, makes power consumption predictable. In edge - side scenarios such as face recognition and image processing, the advantages of NPUs, such as ease of development, high efficiency, and low power consumption, are gradually becoming prominent.

Therefore, you'll notice an interesting phenomenon: The NPUs of all MCU manufacturers are quite "restrained." Their computing power ranges from dozens of GOPS to hundreds of GOPS, far lower than the several TOPS level of mobile NPUs, let alone the hundreds of TOPS of cloud - based GPUs.

For now, the embedded NPU is more like a "shock absorber" in the MCU architecture rather than an "engine." Its role is to absorb the impact of AI workloads and protect the stability of real - time control, rather than pursuing performance limits. Excessive computing power means a larger chip area, higher power consumption, and more complex thermal management - all of which go against the design principles of embedded systems.

More importantly, the scale of current edge AI application models is itself limited. Neural networks running on MCUs are usually lightweight models that have been deeply optimized: with the number of parameters ranging from tens of thousands to millions, and inference taking only a few milliseconds to dozens of milliseconds. A computing power of hundreds of GOPS is already sufficient, and any more would be a waste.

In conclusion, the NPU in an MCU is not the result of a computing power race but an inevitable choice for embedded systems to restructure their architectures in the AI era. Its core value doesn't lie in the TOPS number but in enabling harmonious coexistence between AI and real - time control, finding the optimal balance among determinacy, low power consumption, and small size.

How Do MCU Giants View NPUs?

It's worth mentioning that although the major giants agree on the main line of NPU integration, they have their own focuses in specific implementation and application.

TI: Deep Integration of Real - Time Control and AI, Focusing on Industrial and Automotive Safety Scenarios

TI's strategic core is to deeply embed NPU capabilities into its advantageous real - time control domain, strengthening the "control + AI" integrated solution rather than simply pursuing computing power improvement. This strategy precisely matches the needs of scenarios such as industrial motor control and automotive fault detection, which have extremely high requirements for real - time performance and reliability. After all, in these scenarios, the value of AI lies in improving detection accuracy and response speed without interfering with the execution of core control tasks.

At the product level, TI's TMS320F28P55x series is the industry's first real - time control MCU with an integrated NPU. Based on its classic 32 - bit C28x DSP core with a main frequency of 150MHz, it has real - time signal processing capabilities comparable to those of a 300MHz Arm Cortex - M7. The built - in NPU in this series is specially optimized for convolutional neural network (CNN) models. Its core function is to separate AI inference tasks from the main CPU, achieving computing power isolation. Compared with pure software implementation, it reduces latency by 5 - 10 times and improves fault detection accuracy to over 99%. For example, in applications such as arc fault monitoring and motor fault diagnosis, the NPU can analyze current and voltage data in real - time and quickly identify anomalies, while the CPU focuses on deterministic control tasks such as motor drive and power management. The two work together to ensure that the system responds within a microsecond - level time window.

To lower the development threshold, TI launched the Edge AI Studio toolchain, covering the entire process from model training, optimization, to deployment. Even engineers without much AI experience can quickly complete the development of intelligent control solutions. In addition, this series meets functional safety standards such as ISO 26262 and IEC 61508, with the highest support for the ASIL D level, further adapting to the needs of safety - critical scenarios in the automotive and industrial fields.

Infineon: Leveraging the Arm Ecosystem to Build a General - Purpose Low - Power AI MCU Platform

Infineon has chosen a lightweight path of "Arm architecture + ecological collaboration." Its strategic focus is to lower the development threshold of edge AI and quickly cover a wide range of scenarios such as consumer IoT and industrial HMIs. The core logic is: by reusing the mature combination of the Arm Cortex - M core and the Ethos - U55 micro - NPU, it can quickly achieve large - scale implementation of AI capabilities while ensuring low power consumption. At the same time, it relies on a complete toolchain to reduce customer migration costs.

In terms of products, Infineon's PSOC Edge E8x series (E81, E83, E84) forms a gradient - based layout. The basic E81 model uses a Cortex - M33 core paired with a self - developed NNLite ultra - low - power accelerator, meeting the needs of lightweight AI applications such as simple voice recognition and gesture detection. The high - end E83 and E84 models are upgraded to a combination of a Cortex - M55 core and an Arm Ethos - U55 NPU, supporting the Arm Helium DSP instruction set. Their machine - learning performance is 480 times higher than that of traditional Cortex - M systems. Among them, the Arm Ethos - U55, as a micro - NPU designed specifically for embedded systems, can achieve AI acceleration with milliwatt - level power consumption, perfectly matching the long - lasting power requirements of IoT devices.

Ecosystem building is Infineon's core competitiveness. This series is fully compatible with the ModusToolbox software development platform and integrates the Imagimob Studio edge AI development tool, providing end - to - end support from data collection, model training, to deployment. It also has a rich set of pre - trained models and introductory projects to help customers get started quickly. The application scenarios cover smart home security systems, industrial robot HMIs, wearable devices, etc. The E83 and E84 models can support more complex AI tasks such as face/object recognition and visual position detection, and the E84 also adds a low - power graphics display function, further expanding high - end HMI application scenarios.

NXP: Self - Developed NPU + Software Ecosystem, Focusing on High - Flexibility Edge AI Deployment

NXP's strategic feature is "scalable hardware + full - stack software." By self - developing the eIQ Neutron NPU core and combining it with the unified eIQ AI software toolkit, it creates an edge AI solution that balances flexibility and performance. Its core goal is to meet the needs of supporting diverse neural network models in scenarios such as industrial robots and smart cars, while ensuring the system's real - time response ability with low power consumption.

At the hardware level, NXP's eIQ Neutron NPU adopts a scalable architecture, allowing flexible adjustment of computing power configuration according to application requirements. It supports various neural network models such as CNN, RNN, and Transformer, adapting to the full range of scenarios from simple voice wake - up to complex image classification. This NPU is deeply integrated into MCU and MPU products, achieving computing power isolation through a heterogeneous architecture of "CPU + NPU + DSP" to ensure that AI inference does not affect the execution of core control tasks. For example, in industrial robot applications, the NPU can process visual sensor data in real - time to complete path planning, while the CPU focuses on deterministic tasks such as motor drive and motion control. The two work together to improve the system's response speed.

The software ecosystem is the core support of NXP. The eIQ AI software toolkit provides a unified development interface, supporting mainstream machine - learning frameworks such as TensorFlow Lite and PyTorch, and realizing a localized processing process of "bringing its own model" and "bringing its own data." This not only reduces network latency and bandwidth dependence but also improves data privacy and security. In addition, NXP provides a rich library of pre - trained models and application examples (such as object recognition, handwritten digit recognition, and LLM deployment demonstrations), and detailed tutorials through the GoPoint application code center to accelerate the customer development process.

ST: Self - Developed NPU to Break Performance Limits, Focusing on High - Performance Edge Vision Scenarios

ST's strategic direction is "self - developed NPU + high - performance core," focusing on scenarios with high requirements for AI computing power, such as industrial vision and high - end consumer electronics. Through its self - developed Neural - ART Accelerator NPU, it breaks the AI performance boundaries of traditional MCUs while ensuring real - time performance. The core logic is: for complex edge AI tasks such as computer vision, more powerful dedicated computing power is needed, but power consumption and chip area still need to be strictly controlled to avoid going against the embedded design principles.

In terms of products, ST's STM32N6 series is its first MCU with an integrated self - developed NPU. Based on an 800MHz Arm Cortex - M55 core, it introduces Arm Helium vector processing technology for the first time and is equipped with a Neural - ART Accelerator NPU with a main frequency of up to 1GHz, with an AI computing power of up to 600 GOPS. Although this value is far lower than that of mobile NPUs, it can meet complex requirements such as high - resolution image processing and parallel operation of multiple models. To adapt to vision applications, this series also integrates a MIPI CSI - 2 interface, an image signal processing (ISP) pipeline, and an H264 hardware encoder, forming a complete computer vision processing chain. It can be directly connected to various cameras to achieve real - time image classification, object detection, and other functions.

In terms of hardware design, the STM32N6 is equipped with 4.2MB of continuous embedded RAM and supports high - speed external memory interfaces (hexa - SPI, OCTOSPI, etc.), providing sufficient memory for the storage and operation of neural network models. It also has advanced security features, aiming to pass SESIP Level 3 and PSA Level 3 certifications to meet the security needs of industrial and consumer scenarios. In terms of the ecosystem, this series is seamlessly integrated with ST's edge AI suite and TouchGFX graphics software package, providing complete development tools and reference designs to accelerate the implementation of high - end vision AI products.

Renesas: Dual - Core Heterogeneous + Security Enhancement, Deeply Engaged in High - Reliability Edge AIoT Scenarios

Renesas' strategic core is "heterogeneous architecture + security first." Through a combination of "high - performance core + dedicated NPU + security engine," it focuses on edge AIoT scenarios with extremely high requirements for reliability and security, such as smart homes and industrial predictive maintenance. The core logic is: local AI processing on edge devices not only requires real - time performance and low power consumption but also needs to address the growing network security threats. Therefore, NPU integration must be deeply integrated with the security architecture.

At the product level, Renesas' RA8P1 MCU and RZ/G3E MPU form a high - low - end combination. As a 32 - bit AI MCU, the RA8P1 uses a dual - core architecture of 1GHz Cortex - M85 and 250MHz Cortex - M33, paired with an Arm Ethos - U55 NPU, with an AI computing power of 256 GOPS. It can perform tasks such as voice recognition, image classification, and anomaly detection. It also supports the Arm TrustZone secure execution environment, hardware root of trust, and advanced encryption engine to ensure the security of AI models and data. The RZ/G3E, as a 64 - bit MPU, uses a quad - core Cortex - A55 + Cortex - M33 architecture and also integrates an Ethos - U55 NPU, with the computing power increased to 512 GOPS. It can handle more complex edge AI tasks, such as high - definition image analysis and multi - sensor data fusion.

To simplify development, Renesas launched the RUHMI (Robust Unified Heterogeneous Model Integration) framework, supporting mainstream ML formats such as TensorFlow Lite and PyTorch. It can help developers quickly import and optimize pre - trained models. At the same time, it provides intuitive debugging tools and example applications through the e² studio integrated development environment. In addition, Renesas is also promoting zero - touch security solutions such as post - quantum cryptography (PQC) to resist network threats in the quantum - computing era, further enhancing the security of edge AI systems.

The Emergence of New - Type Memory

If the introduction of NPUs solves the problem of computing power isolation, then the transformation of the memory architecture is the underlying infrastructure supporting the entire AI - enabled transformation. When AI + NPU pushes traditional Flash to its technological limits, new - type memory has naturally become the common choice of the giants.

First, it's important to clarify that once an NPU and AI capabilities are introduced into an MCU, the problems of the traditional Flash architecture are immediately exposed. The first dilemma is model lifecycle management. Edge AI doesn't work with a one - time trained model for good; it requires continuous iteration. In automotive applications, OTA has become standard, and AI models may be updated monthly or even weekly. However, the erase - write life of Flash is only a few thousand to tens of thousands of times. If Flash is erased and written every time there's an update, the chip may fail before the vehicle is scrapped.

The second dilemma is immediate learning and parameter caching. Edge AI not only needs to perform inference but also, in some scenarios, adjust parameters online or conduct incremental learning. In the traditional architecture, model parameters are stored in Flash and loaded into SRAM during inference. However, SRAM has limited capacity (usually only a few MB) and is volatile, losing data when powered off. This architecture cannot support the emerging demand of "edge learning."

The third dilemma is the startup path and read performance. Embedded AI devices often require "instant - on" operation. Devices in industrial settings may be frequently powered off and restarted, and each startup delay will affect production efficiency. The read latency and warm - up time of Flash become obvious drawbacks in this scenario. Industry data shows that it takes about

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

All the major MCU players are now in the open.