On September 10, the Arm Unlocked 2025 AI Technology Summit was held. At this summit, Arm officially launched the Arm Lumex Compute Subsystem (CSS) platform, while also announcing the complete lineup of C1 CPU IP and Mali G1 GPU IP.

In the afternoon, James McNiven, Vice President of Product Management for Arm’s Client Division, and Ronan Naughton, Director of Product Management for Arm’s Client Division, engaged in deeper discussions with the media. They clearly outlined Arm’s vision for terminal AI, centered on “heterogeneous computing,” with a principle of “efficiency first” and a goal of “broad empowerment.” This was a sincere technical sharing.

The following article is a summary and refinement of the content from the Technology Sharing Day by Wang Yiran, Editor-in-Chief of iMobile Technology.

Author’s Preface—

Arm’s strategy in the AI era is not to pursue extreme peak computing power, but to build an efficient, inclusive, and open heterogeneous computing ecosystem. By enhancing the real-time AI capabilities of CPUs through SME2, advancing the integration of graphics and AI through GPUs, and conveniently delivering all these capabilities to the entire developer ecosystem through the KleidiAI software library.

In any case, the future vision is that any device based on the Arm architecture, regardless of whether it is equipped with a top-tier dedicated NPU, can possess basic and efficient AI capabilities, thus driving AI from the cloud to truly reach the terminal, integrating into every device and every subtle aspect of the experience.

SME2 Technology Transforms the Role of CPU

Arm has introduced the second generation of Scalable Matrix Extension (SME2) technology for its CPU cores, which can provide an additional 2 to 6 TOPS of computing power. This seemingly modest figure reflects Arm’s profound understanding of terminal AI workloads.

James McNiven, Vice President of Product Management for Arm’s Client Division, emphasized its importance multiple times: “The bottleneck for many AI tasks is not computing power, but memory bandwidth. The advantage of SME2 is that it executes directly in the CPU core, allowing immediate access to cache and system memory.”

1. Precise Scene Positioning:Arm clearly states that SME2 is not designed for running large language models (LLMs) with hundreds of billions of parameters, but rather focuses on low-latency, small models, and continuous online tasks. For example, voice wake-up, real-time image preprocessing, and context-aware suggestions on devices. These tasks are frequently triggered, requiring millisecond-level responses and are extremely sensitive to energy efficiency.

2. Breaking the “Memory” Bottleneck:Arm revealed a key bottleneck in current AI performance—memory bandwidth. The peak computing power of many NPUs is limited by the inability to access data at high speeds, preventing performance from being fully unleashed. The core advantage of SME2 lies in its direct integration into the CPU core, allowing for ultra-low latency access to high-speed cache and system memory, thus achieving efficiency far beyond theoretical computing power in practical applications.

3. Programmability and Versatility:Unlike fixed-function NPUs, CPUs equipped with SME2 offer complete programmability. This means developers can flexibly adapt to rapidly evolving AI models and algorithms without waiting for hardware updates. More importantly, the CPU is the only hardware that exists 100% in the Arm ecosystem, providing developers with a consistent and non-fragmented AI acceleration foundation, ensuring AI functions can run seamlessly across all devices.

As Arm has introduced, the CPU will always be the general-purpose core. AI will run heterogeneously across CPUs, GPUs, NPUs, and the cloud, but the CPU will always be the central component at the system level. Therefore, the C1 series is not just a performance upgrade; Arm’s strategy is not to replace NPUs with CPUs, but to strengthen the CPU’s position in heterogeneous computing systems through SME2, maintaining its requirements for latency and flexibility.

Combining GPU and AI for Next-Generation Mobile Experience Enhancement

The GPU plays another key role in Arm’s AI landscape—handling AI tasks that are highly integrated with graphics and vision.

James McNiven, Vice President of Product Management for Arm’s Client Division, believes this is a key step in the evolution of the GPU’s role, as AI is reshaping the graphics field. In the future, GPUs will not only be quality renderers but also intelligent visual platforms. This echoes Arm’s concept of Neural Graphics: achieving super-resolution, frame generation, and denoising through AI, making GPUs the hub connecting perception and visual experience.

The Mali G1-Ultra achieves nearly double the performance on typical int8 and FP16 AI workloads by adding dedicated instructions and optimizing the microarchitecture, strongly supporting applications such as AI super-resolution and in-game AI enhancements.

Additionally, Arm Neural Technology introduces new AI-driven frame optimization, super sampling, and denoising technologies. It is provided in the form of Vulkan extensions, featuring programmability, marking Arm’s beginning of deeply integrating AI into the graphics pipeline, laying the software and hardware foundation for future mobile ray tracing, AI super-resolution (similar to DLSS/FSR), and frame generation technologies.

Although Arm does not manufacture devices, its technology is becoming the cornerstone of high-performance mobile gaming.

Focusing on Core IP to Empower Differentiation

Arm insists on only producing the most valuable core computing IP (CPU/GPU), while completely opening up the innovation space for system-level components such as NPU and ISP to partners. This strategy ensures the continuous evolution of foundational computing while allowing partners like Samsung and MediaTek to create distinctive SoCs.

Currently, KleidiAI has been integrated into frameworks such as PyTorch ExecuTorch, Google LiteRT, Alibaba MNN, and Microsoft ONNX Runtime. Developers can obtain the acceleration capabilities of SME2 directly without additional code modifications. This “out-of-the-box” approach significantly lowers the development threshold and provides a foundation for rapid validation by ecosystem partners.

Arm expects that by 2030, SME and SME2 technologies will cover over 3 billion devices, adding more than 10 billion TOPS of computing power.

ENDPrevious Selected Recommendations

Long Battery Life + Stunning Design: vivo Y500 New Battery Life Master Continues

Comprehensive Revolution: Samsung Galaxy Z Fold7 In-Depth Experience

iQOO Z10 Turbo+ Review: I Want Both Battery Life and Performance

Arm Technology Sharing Day

Long Battery Life + Stunning Design: vivo Y500 New Battery Life Master Continues

Comprehensive Revolution: Samsung Galaxy Z Fold7 In-Depth Experience

iQOO Z10 Turbo+ Review: I Want Both Battery Life and Performance

Leave a Comment Cancel reply

Long Battery Life + Stunning Design: vivo Y500 New Battery Life Master Continues

Comprehensive Revolution: Samsung Galaxy Z Fold7 In-Depth Experience

iQOO Z10 Turbo+ Review: I Want Both Battery Life and Performance

Related posts

Leave a Comment Cancel reply