How to Build a Soft-Hardware Collaborative 'Core' Base in the Era of Edge AI?

This article is reprinted from the Jishu Community

Source:How to Build a Soft-Hardware Collaborative ‘Core’ Base in the Era of Edge AI?

By 2025, with the emergence of AI applications such as DeepSeek, robots, AI agents, and multimodal generation like GPT-4o, concepts like “deep thinking,” “intelligent agents,” and “multimodal” have rapidly transitioned from specialized fields into the public eye. This not only spurred the creation of a new large model hardware and software ecosystem but also accelerated the iteration and upgrade of various AI application forms and terminal hardware products. The explosive growth in user experience has allowed consumers to tangibly feel the unique innovative value of AI. Industry giants are increasingly investing to strengthen the underlying technical support for core processes such as model training, inference computation, and application development.

While cloud-side AI scenarios are developing vigorously, edge-side innovations are also accelerating. Players across the industry chain, including chip manufacturers, operating system developers, and consumer electronics companies like mobile phones and PCs, are jointly exploring the balance of key factors such as computing power enhancement, cost optimization, and application scenario expansion.

How to Build a Soft-Hardware Collaborative 'Core' Base in the Era of Edge AI?

Edge Devices

Important Carriers for the Universalization of AI Technology

With the continuous popularization of smart terminal devices and the upgrade of computing power, edge AI has transformed from a technical concept into actual productivity. Consumer-grade terminals such as PCs, mobile phones, robots, XR devices, and smart cockpits, with their high penetration rates and real-time interaction characteristics, have become important carriers for the implementation of edge AI. Driven by improvements in chip performance and model optimization technologies, edge devices now possess the capability to efficiently deploy AI models, accelerating the migration of AI applications from the cloud to the edge. In terms of technical implementation paths, several key directions deserve special attention:

In the area of lightweight SLM models, compared to cloud-side large language models (LLMs) with hundreds of billions of parameters, small language models (SLMs) with 1.5B to 7B parameters are gradually becoming the mainstream choice for edge AI due to their excellent computational efficiency and lower memory usage. For example, the DeepSeek-R1 distilled version effectively reduces computational resource requirements while maintaining high performance, further broadening the application boundaries of edge AI.
In the field of multimodal computing, with the continuous iteration of LLM and SLM technologies, various mainstream consumer electronics equipped with multimodal sensors such as cameras and microphones are accelerating the upgrade of image recognition and voice interaction technologies under the support of AI technology. This not only achieves localized secure processing of private data but also injects strong growth potential into emerging application scenarios like robotics, enabling real-time processing of massive audio and video data, significantly enhancing key capabilities such as environmental perception and interaction.
In the area of Copilot intelligent productivity tools, generative AI has been widely applied in programming assistance, smart office, image processing, and audio-video editing across many productivity scenarios. The special requirements for low latency and privacy protection make these applications naturally fit edge computing architectures, simultaneously driving rapid growth in terminal computing power demand.
AI Agent technology is reconstructing the human-computer interaction paradigm. Through capabilities such as natural language understanding, task decomposition, and multi-task collaboration, intelligent assistant systems will gradually replace traditional graphical interfaces, providing users with a more natural, smooth, and efficient intelligent interaction experience.

Heterogeneous Computing

The Key to Winning the Edge Competition

Computing power has always been a core element in the expansion of AI application scenarios and technological innovation. Compared to the high-computing CPU and GPU clusters deployed in a centralized manner on the cloud side, edge computing power exhibits significant differentiated characteristics. Due to the large and dispersed number of edge devices, the level of computing power varies widely, and the constraints of power consumption and cost are strict. These factors have given rise to a diversified edge computing power system. In the long run, heterogeneous computing is undoubtedly the optimal solution for the implementation of edge AI. How to Build a Soft-Hardware Collaborative 'Core' Base in the Era of Edge AI? CPUs, as the basic computing units of edge devices, are widely used in various devices from entry-level to high-end due to their excellent versatility. The Arm® Cortex® series IP not only meets stable general computing needs but also has its accompanying Kleidi software library specifically optimized for the AI acceleration capabilities of CPUs. In practical applications, CPUs often serve as the starting point for AI workloads, providing developers with convenient deployment paths. Additionally, as LLMs become increasingly lightweight, CPUs are also taking on more complex AI computing tasks.NPUs, with their outstanding energy efficiency advantages, are gradually becoming the mainstay of edge AI computing, particularly suitable for handling high-computing, long-duration AI tasks. Arm’s self-developed next-generation “Zhouyi” NPU adopts an architecture optimized for large model characteristics, increasing external bandwidth to 256GB/s, fully supporting FP16 computation, and providing a complete INT4 soft and hard quantization acceleration solution. Through soft and hard collaborative optimization, the “Zhouyi” NPU achieves efficient expansion of multi-core computing power, providing core momentum for the intelligent upgrade of terminal devices.In the field of graphics-related AI computing, GPUs have unique advantages, particularly excelling in video processing and gaming scenarios. The Arm Mali™ and Immortalis™ series GPUs effectively support various AI workloads through parallel computing architectures while maintaining excellent energy efficiency ratios, achieving collaborative optimization of graphics rendering and AI computing, providing users with a more immersive experience.The collaborative work of CPUs, NPUs, and GPUs, combined with an edge-cloud hybrid computing model, can meet the diverse computing power needs of most AI application scenarios. As heterogeneous computing technology continues to evolve, edge devices are gradually taking on a larger proportion of AI computing tasks, a trend that is reshaping the entire AI computing ecosystem.

Software Frameworks

The Key Link Connecting AI Applications and Computing Power

The AI software ecosystem serves as the soil for application development, bearing the responsibility of efficiently scheduling hardware computing power. Currently, the cloud side has formed a mature technology system centered around “PyTorch+CUDA+GPU,” providing standardized support for AI research and development; however, the edge AI ecosystem exposes many urgent issues that need to be addressed, such as severe ecological fragmentation, poor adaptability to large models, poor cross-platform compatibility, limited scalability, and inability to quickly respond to application function iteration needs.To address this, Arm’s “Zhouyi” NPU provides a complete AI software platform—”Zhouyi” Compass—allowing developers to conveniently and quickly port and deploy algorithms. This platform offers a complete set of end-to-end AI software stack tools, covering simulators, drivers, Runtime, OpenCL language compilers, and Compiler network compilers, effectively supporting different levels of development needs. Additionally, the platform has added support for Hugging Face models, which are of great interest to developers, and through initiatives such as the open-source network compiler’s Parser and OPT, adapting to TVM, and releasing DSL-specific domain programming languages, it effectively helps developers integrate AI technologies based on the “Zhouyi” NPU into various development projects.Currently, the edge AI software ecosystem is in a critical development phase, needing to find a balance between standardization and customization. The industry generally expects that in the next 2 to 3 years, one or two dominant benchmark frameworks will emerge, requiring deep optimization of computing performance for specific hardware platforms while maintaining openness.

Outlook: AI Applications Accelerate Edge Chip Innovation

Pressing the Innovation “Accelerator” Key

Looking back at the development history of PC internet and mobile internet, application demand has always been the key driving force behind chip technology iteration. In mobile application scenarios, the need to balance performance and power consumption has made CPU big.LITTLE architecture mainstream; high consumer demands for camera quality have driven chip-level image processing technology to become standard in flagship smartphones; and the need for gaming and viewing anytime and anywhere has prompted chip manufacturers to continuously enhance chip graphics rendering and video codec capabilities.As we enter a new wave of AI technology, how to build a heterogeneous computing power system, collaboratively adapt the software ecosystem, and accelerate the implementation of AI applications in scenarios such as PCs, mobile phones, smart wearables, robots, and cockpits will undoubtedly become a key proposition for edge manufacturers to achieve sustainable development. Arm will continue to deepen its focus on the edge AI field, leveraging self-developed “Zhouyi” NPU products, combining the advantages of the Arm ecosystem, and an open-source software ecosystem to continuously empower technological innovation and industrial upgrades in the AI era.Disclaimer: Arm, Cortex, Immortalis, and Mali are registered trademarks or trademarks of Arm Limited (or its subsidiaries).

Reprinted from | Jishu Community

END



关于安芯教育



安芯教育是聚焦AIoT（人工智能+物联网）的创新教育平台，提供从中小学到高等院校的贯通式AIoT教育解决方案。
安芯教育依托Arm技术，开发了ASC（Arm智能互联）课程及人才培养体系。已广泛应用于高等院校产学研合作及中小学STEM教育，致力于为学校和企业培养适应时代需求的智能互联领域人才。

Related posts

Leave a Comment Cancel reply