Introducing Coral NPU: A Full-Stack Platform for Edge AI

Key Points: The Coral NPU is a full-stack open-source platform designed to address the core performance, fragmentation, and privacy challenges that limit the powerful, continuous operation of AI on low-power edge devices and wearables.

Coral NPU: AI-First Architecture

Generative AI is fundamentally reshaping our expectations of technology. We have seen the powerful capabilities of large-scale cloud models in creation, inference, and assistance. However, the next great technological leap is not just about making cloud models larger; it is about embedding their intelligence directly into our immediate personal environments. For AI to truly assist us—actively helping us through our day, translating conversations in real-time, or understanding our physical surroundings—it must run on the devices we wear and carry. This presents a core challenge: embedding environmental AI into battery-constrained edge devices, moving it away from the cloud to enable truly private, always-on assistance experiences.

To transition from the cloud to personal devices, we must address three key issues:

Performance Gap: Complex state-of-the-art machine learning (ML) models require more computational resources than the limited power, thermal, and memory budgets of edge devices can provide.
Fragmentation Costs: Compiling and optimizing ML models for various proprietary processors is both difficult and expensive, hindering consistent performance across devices.
User Trust Deficit: To be truly helpful, personal AI must prioritize the privacy and security of personal data and environments.

Today, we introduce the Coral NPU, a full-stack platform built on our initial Coral work, providing hardware designers and ML developers with the tools needed to build the next generation of private, efficient edge AI devices. The Coral NPU is an AI-first hardware architecture co-designed with Google Research and Google DeepMind, aimed at supporting the next generation of ultra-low-power, continuously running edge AI. It offers a unified developer experience, making it easier to deploy applications like environmental awareness. It is specifically designed to enable all-day AI experiences on wearable devices while minimizing battery usage and can be configured for higher performance use cases. We have released documentation and tools to allow developers and designers to start building immediately.

Coral NPU Architecture Explained

Developers building for low-power edge devices face a fundamental trade-off: choosing between general-purpose CPUs and dedicated accelerators. General-purpose CPUs provide critical flexibility and broad software support but lack domain-specific architectures optimized for demanding ML workloads, resulting in lower performance and power efficiency. In contrast, dedicated accelerators offer high ML efficiency but lack flexibility, are difficult to program, and are unsuitable for general tasks.

The highly fragmented software ecosystem exacerbates this hardware issue. Because the programming models for CPUs and ML blocks are vastly different, developers are often forced to use proprietary compilers and complex command buffers. This results in a steep learning curve and makes it difficult to leverage the unique advantages of different compute units. Consequently, the industry lacks mature, low-power architectures that can easily and effectively support multiple ML development frameworks.

The Coral NPU architecture directly addresses this issue by disrupting traditional chip design. It places the ML matrix engine above scalar computation, optimizing the AI architecture at the silicon level, creating a platform specifically built for more efficient device-side inference.

As a complete reference neural processing unit (NPU) architecture, the Coral NPU provides building blocks for the next generation of energy-efficient, ML-optimized system-on-chips (SoCs). This architecture is based on a set of RISC-V ISA-compliant architecture IP blocks, designed for minimal power consumption, making it ideal for continuous environmental awareness. The foundational design delivers performance in the range of 512 giga operations per second (GOPS) while consuming only a few milliwatts, providing powerful device-side AI for edge devices, hearables, AR glasses, and smartwatches.

Unified Developer Experience

The Coral NPU architecture is a simple, programmable C language target that can seamlessly integrate with modern compilers like IREE and TFLM. This simplifies support for ML frameworks such as TensorFlow, JAX, and PyTorch.

The Coral NPU includes a comprehensive software toolchain, featuring specialized solutions like the TFLM compiler for TensorFlow, as well as general-purpose MLIR compilers, C compilers, custom kernels, and simulators. This provides developers with flexible pathways. For example, models from frameworks like JAX are first imported into MLIR format using the StableHLO dialect. This intermediate file is then input into the IREE compiler, which applies hardware-specific plugins to recognize the Coral NPU architecture. The compiler then performs progressive lowering—a critical optimization step where the code is systematically transformed through a series of dialects, getting closer to the machine’s native language. After optimization, the toolchain generates the final compact binary, ready for efficient execution on edge devices. This industry-standard developer toolset helps simplify the programming of ML models and provides a consistent experience across various hardware targets.

The collaborative design process of the Coral NPU focuses on two key areas. First, the architecture efficiently accelerates leading encoder architectures used in today’s device-side visual and audio applications. Second, we are closely collaborating with the Gemma team to optimize the Coral NPU to support small transformer models, helping ensure that the accelerator architecture supports the next generation of edge generative AI.

This dual focus means that the Coral NPU is on track to become the first open, standards-based, low-power NPU to bring LLMs to wearable devices. For developers, this provides a single, validated path to deploy current and future models with maximum performance and minimal power consumption.

Target Application Scenarios

The Coral NPU is designed to support ultra-low-power, continuously running edge AI applications, with a particular focus on environmental awareness systems. Its primary goal is to enable all-day AI experiences on wearable devices, smartphones, and Internet of Things (IoT) devices while minimizing battery usage.

Potential application scenarios include:

Technical Features:

Environmental Awareness: Detecting user activities (such as walking, running), proximity, or environments (indoor/outdoor, in motion) to enable “Do Not Disturb” modes or other context-aware features.

Audio Processing: Voice and speech detection, keyword recognition, real-time translation, transcription, and audio-based assistive features.

Image Processing: Person and object detection, facial recognition, gesture recognition, and low-power visual search.

User Interaction: Control through gestures, audio prompts, or other sensor-driven inputs.

Hardware-Forced Privacy

The core principle of the Coral NPU is to build user trust through hardware-enforced security. Our architecture design supports emerging technologies like CHERI, which provides fine-grained memory-level security and scalable software isolation. Through this approach, we aim to isolate sensitive AI models and personal data within hardware-enforced sandboxes, mitigating memory-based attacks.

Building the Ecosystem

Open-source hardware projects rely on strong partnerships for success. To this end, we are collaborating with Synaptics, our first strategic silicon partner and a leader in IoT embedded computing, wireless connectivity, and multimodal sensing. Today, at their technology day, Synaptics announced the new Astra™ SL2610 series AI-native IoT processors. This product line features the Torq™ NPU subsystem, the industry’s first production implementation of the Coral NPU architecture. This NPU design supports transformers and dynamic operators, enabling developers to build future-ready edge AI systems for consumer and industrial IoT.

This partnership supports our commitment to a unified developer experience. The Synaptics Torq™ edge AI platform is built on open-source compilers and runtimes based on IREE and MLIR. This collaboration is an important step towards establishing shared open standards for smart, context-aware devices.

Addressing the Core Crisis of Edge Computing

With the Coral NPU, we are building the foundational layer for the future of personal AI. Our goal is to foster a vibrant ecosystem by providing a universal, open-source, secure platform. This empowers developers and silicon vendors to transcend today’s fragmented landscape, allowing them to collaborate on shared standards in edge computing for faster innovation. Learn more about the Coral NPU and start building today.

Documentation Source: Introducing Coral NPU A full-stack platform for Edge AIOriginal Author: Billy Rutledge, Engineering Director at Google ResearchOriginal Publication Date: OCT. 15, 2025

This article has been organized and optimized by an AI assistant. Please feel free to follow, share, and reproduce, citing the source.