Google Enters RISC-V with an Open Source NPU

Recently, Synaptics launched a new multimodal Gen AI processor designed for smart IoT edge applications. This new SoC, as the first product in the Astra SL2600 series multimodal Gen AI processors, targets the edge market that requires simultaneous processing of visual, audio, speech, and sensor data.

This product is said to represent a new category of architecture within Synaptics’ Astra product line, positioned between microcontroller-level devices and high-end embedded MPUs.

According to Synaptics, the architecture of the SL2610 was born out of the industry’s demand for greater scalability, as traditional IP licensing models lag behind the AI innovation cycle. “Conventional traditional chip design methods can no longer keep pace with the speed of AI development,” Synaptics stated.

This prompted them to integrate Google’s open-source NPU based on the RISC-V architecture into their chip. From Google’s perspective, this product, named Coral NPU, is a full-stack open-source platform aimed at addressing the core performance, fragmentation, and privacy challenges that limit powerful, always-on AI in low-power edge devices and wearables.

Thus, Google has unveiled another chip from its lineup.

Edge AI is an inevitable trendGenerative artificial intelligence fundamentally reshapes our expectations of technology. We have witnessed the incredible power of large-scale cloud models to create, reason, and assist in remarkable ways. However, the next great technological leap is not just about making cloud models larger, but embedding their intelligence directly into our personal environments.For AI to truly assist us—actively helping us plan our daily lives, translating conversations in real-time, or understanding our physical environment—it must run on the devices we wear and carry. This brings a core challenge: embedding environmental AI into battery-constrained edge devices, liberating them from the cloud to achieve a truly private, always-on assistive experience.To shift from the cloud to personal devices, we must address three key issues:Performance gap: Complex, state-of-the-art machine learning (ML) models require more computation, far exceeding the limited power, heat, and memory budgets of edge devices.Fragmentation tax: Compiling and optimizing ML models for diverse proprietary processors is both difficult and costly, hindering consistent performance across devices.Lack of user trust: For personal AI to truly work, it must prioritize the privacy and security of personal data and context.

Specifically at the edge, developers building for low-power edge devices face a fundamental trade-off between choosing general-purpose CPUs and dedicated accelerators. General-purpose CPUs provide critical flexibility and broad software support but lack domain-specific architectures for high-demand machine learning workloads, resulting in lower performance and power efficiency. In contrast, dedicated accelerators can offer higher machine learning efficiency but lack flexibility, are difficult to program, and are not suitable for general tasks.

The highly fragmented software ecosystem further exacerbates hardware issues. Due to the stark differences in programming models between CPUs and machine learning modules, developers are often forced to use proprietary compilers and complex command buffers. This leads to steep learning curves and difficulties in integrating the unique advantages of different computing units. Consequently, the industry lacks mature low-power architectures that can easily and effectively support multiple machine learning development frameworks.

In light of this, Google proudly introducesCoral NPUa full-stack platform based onCoral (a complete toolkit for building local AI products. Its device-side inference capabilities allow developers to create efficient, private, fast, and offline products), designed to provide hardware designers and machine learning developers with the tools needed to build the next generation of private, efficient edge AI devices.According to reports, the Coral NPU was co-designed by Google, Google Research, and Google DeepMind, featuring an AI-first hardware architecture aimed at supporting the next generation of ultra-low-power, always-on edge AI. It provides a unified developer experience, making the deployment of applications such as environmental awareness much easier. It is designed for implementing always-on AI in wearable devices while minimizing battery consumption and can be configured for higher performance use cases.

Coral NPU: AI First

The Coral NPU is a complete reference-grade neural processing unit (NPU) architecture that provides building blocks for the next generation of high-efficiency, machine learning (ML) optimized system-on-chip (SoC). This architecture is based on a set of architecture IP modules compliant with the RISC-V ISA standard, designed for ultra-low power consumption, making it ideal for always-on environmental awareness. Its foundational design can deliver performance levels of 512 giga operations per second (GOPS) while consuming only a few milliwatts, thus providing powerful device-side AI capabilities for edge devices, hearable devices, AR glasses, and smartwatches.

The open and scalable architecture based on RISC-V allows SoC designers to flexibly modify the foundational design or use it as a pre-configured NPU. The Coral NPU architecture includes the following components:

  • Scalar Core:A lightweight, C-programmable RISC-V front end for managing data flow to the back-end cores, achieving ultra-low power consumption and traditional CPU functionality using a simple “run-to-completion” model.
  • Vector Execution Unit:A powerful single instruction multiple data (SIMD) coprocessor compliant with the RISC-V Vector Instruction Set (RVV) v1.0, capable of operating on large datasets simultaneously.
  • Matrix Execution Unit:An efficient quantized outer product multiply-accumulate (MAC) engine designed to accelerate fundamental neural network computations. It is important to note that this matrix core is still under development and will be released on GitHub later this year.

The Coral NPU architecture is a simple target that can be programmed in C language and seamlessly integrated with modern compilers such as IREE and TFLM. This enables it to easily support machine learning frameworks like TensorFlow, JAX, and PyTorch.

The Coral NPU integrates a comprehensive software toolchain, including dedicated solutions like the TFLM compiler for TensorFlow, as well as general-purpose MLIR compilers, C compilers, custom kernels, and simulators. This provides developers with flexible development paths. For example, models from frameworks like JAX can first be imported into MLIR format using the StableHLO dialect. Then, this intermediate file is input into the IREE compiler, which applies hardware-specific plugins to recognize the architecture of the Coral NPU. After this, the compiler performs progressive degradation—an essential optimization step where the code is systematically translated through a series of dialects to bring it closer to the machine’s native language. After optimization, the toolchain generates the final compact binary that can be executed efficiently on edge devices. This industry-standard developer toolset helps simplify the programming of machine learning models and provides a consistent experience across various hardware targets.

The collaborative design process of the Coral NPU focuses on two key areas. First, the architecture efficiently accelerates leading encoder-based architectures in today’s device-side visual and audio applications. Second, Google is closely collaborating with the Gemma team to optimize the Coral NPU for small transformer models, ensuring that this accelerator architecture can support the next generation of edge generative AI.

This dual focus means that the Coral NPU is expected to be the first open, standards-based low-power NPU aimed at bringing LLMs to wearable devices. For developers, this provides a validated single pathway to deploy current and future models with the lowest power consumption while achieving optimal performance.

In summary, the design of the Coral NPU follows several key principles:

1. ML-First Architecture: The Coral NPU disrupts traditional processor design. The Coral NPU does not start with basic scalar computation, then add vector (SIMD) capabilities, and finally add matrix functionality; instead, it first builds matrix (ML) capabilities and then integrates vector and scalar functions. This tightly integrated architecture of scalar/vector/matrix within a single ISA fundamentally optimizes the entire architecture to handle AI workloads.

2. Dedicated Machine Learning Engine: At the core of this design is a quantized outer product multiply-accumulate (MAC) engine specifically designed for the foundational computations of neural networks. This dedicated core can process 8-bit operations into 32-bit results with extreme efficiency.

3. Integrated Vector (SIMD) Core: The vector coprocessor implements the RISC-V Vector Instruction Set (RVV) v1.0, using a 64 x 256-bit vector register file and a “strip mining” mechanism, where a single instruction triggers multiple operations, significantly enhancing efficiency.

4. A streamlined C-language programmable scalar core: The lightweight RISC-V (RV32IM) front end acts as a simple controller, managing and driving the powerful matrix and vector back ends. This core is designed for a “run-to-completion” model, eliminating the need for complex operating systems or frequent interrupts, thus achieving ultra-low power consumption.

5. Efficient memory management: The Coral NPU uses a single-layer small, fast cache (8KB for instructions, 16KB for data) to keep data close to the processing unit, minimizing power consumption and latency.

6. Unified developer experience: The platform supports C language programming and is designed to easily integrate with modern ML compilers like TensorFlow Lite Micro (TFLM) and IREE. This allows the MLIR-based unified toolchain to support models from mainstream frameworks like TensorFlow, JAX, and PyTorch.

In terms of performance, according to Google’s plans, the design of the Coral NPU achieves an efficient balance between power, performance, and size, making it an ideal choice for environmental applications and scalable to multi-core setups.

  • Performance: The target is 512 GOP/S (giga operations per second), with 256 MAC per cycle.
  • Power Target: Ultra-low power, targeting around 6mW at 1GHz, using 22nm technology.

Target Applications

The Coral NPU is designed to support ultra-low-power, always-on edge AI applications, with a particular focus on environmental awareness systems. Its primary goal is to enable always-on AI experiences in wearable devices, smartphones, and IoT devices while minimizing battery consumption.

Potential use cases include:

Context Awareness: Detecting user activities (e.g., walking, running), proximity, or environments (e.g., indoors/outdoors, on the go) to enable “Do Not Disturb” modes or other context-aware features.

Audio Processing: Voice and speech detection, keyword recognition, real-time translation, transcription, and audio-based assistive features.

Image Processing: Human and object detection, facial recognition, gesture recognition, and low-power visual search.

User Interaction: Control through gestures, audio prompts, or other sensor-driven inputs.

Hardware-Enforced Privacy

The core principle of the Coral NPU is to establish user trust through hardware-enforced security. Our architecture is designed to support emerging technologies like CHERI, which provide fine-grained memory-level security and scalable software isolation. Through this approach, Google aims to isolate sensitive AI models and personal data within hardware-enforced sandboxes, mitigating memory-based attacks.

Building an Ecosystem

The success of open hardware projects relies on strong partnerships. To this end, Google has partnered with Synaptics. Synaptics is Google’s first strategic chip partner and a leader in IoT embedded computing, wireless connectivity, and multimodal sensing. Today, at Synaptics’ technology day, Synaptics announced the launch of the new Astra™ SL2610 series AI-native IoT processors. This product line features its Torq™ NPU subsystem, which is the industry’s first production implementation of the Coral NPU architecture. The design of this NPU supports converters and dynamic operators, enabling developers to build future-proof edge AI systems for consumer and industrial IoT.

This collaboration highlights Google’s commitment to creating a unified developer experience. The Synaptics Torq™ Edge AI platform is built on open-source compilers and runtimes based on IREE and MLIR. This partnership is an important step towards building intelligent context-aware devices with shared open standards.

With the Coral NPU, Google is building the foundational layer for personal AI’s future. Google’s goal is to create a vibrant ecosystem by providing the industry with a universal, open-source, and secure platform. This will empower developers and chip suppliers to break through the current fragmentation and collaborate based on shared standards of edge computing, accelerating innovation.

Leave a Comment