The Fourth Generation Computing Revolution: Heterogeneous Computing Based on Software-Hardware Integration

The Fourth Generation Computing Revolution: Heterogeneous Computing Based on Software-Hardware Integration

This article is reprinted from the public account: Software-Hardware Integration (ID: cash-arch) Editor’s Note Recently, the National Development and Reform Commission and four other departments jointly released the “Implementation Plan for the National Integrated Big Data Center Collaborative Innovation System Computing Power Hub,” proposing to build national computing power network hub nodes in Beijing-Tianjin-Hebei, Yangtze … Read more

Understanding Mobile CPU Process Suffixes

Understanding Mobile CPU Process Suffixes

We all know that for mobile processors, the core of their performance lies in the CPU architecture and GPU cores. For example, ARM Cortex A76 is inherently stronger than A75, while Cortex A55 is certainly better than A53. In terms of GPU, Mali-G73MP6 (where the suffix “MP+x” indicates the number of compute cores) outperforms Mali-G72MP6, … Read more

In-Depth Analysis of Huawei’s Kirin 950 Processor

In-Depth Analysis of Huawei's Kirin 950 Processor

With the release of the Huawei Mate 8, the top spot on benchmarking software like AnTuTu has finally been occupied by a domestically produced chip, the Huawei Kirin 950 processor. So, what secrets does the Kirin 950 employ to achieve this domestic chip resurgence? History of Huawei HiSilicon Development In the domestic smartphone arena, Huawei … Read more

Performance Ranking of ARM Mali Series GPUs

Performance Ranking of ARM Mali Series GPUs

Click the above Computer Enthusiasts to follow us In the field of SoCs (processors) dedicated to smartphones, the most common GPU brands are Qualcomm Adreno, Imagination PowerVR, and ARM Mali. Today, let us summarize the performance ranking of ARM Mali GPUs. It should be noted that ARM does not produce processors itself; it only provides … Read more

vLLM Framework Source Code Analysis: Block Allocation and Management

vLLM Framework Source Code Analysis: Block Allocation and Management

1. Block Overview A significant innovation of vLLM is the division of the physical layer GPU and CPU available memory into several blocks, which effectively reduces memory fragmentation issues. Specifically, vLLM’s blocks are divided into logical and physical levels, with a mapping relationship between the two. The following diagram explains the relationship between the two … Read more

DAC 2019 Low Power Object Detection System Design Challenge: GPU and FPGA Dual Championship Solutions Explained

DAC 2019 Low Power Object Detection System Design Challenge: GPU and FPGA Dual Championship Solutions Explained

Machine Heart Released Author: Zhang Xiaofan On June 5, 2019, the second “Low Power Object Detection System Design Challenge” hosted by the top electronic automation design conference DAC concluded in Las Vegas (Machine Heart reported on the first competition last year). This competition aimed to design high-precision and energy-efficient object detection systems for terminal devices, … Read more

Understanding CPU, MCU, MPU, SoC, DSP, ECU, GPU, and FPGA

Understanding CPU, MCU, MPU, SoC, DSP, ECU, GPU, and FPGA

In electronic engineering and embedded development, terms like CPU, MCU, MPU, SoC, DSP, ECU, GPU, and FPGA are often mentioned. However, due to similar names and overlapping uses, beginners often feel confused. This article will comprehensively analyze their characteristics from four perspectives: definition, performance, usage, and price. — 1. CPU (Central Processing Unit) Definition The … Read more

Cost-Effective Fine-Tuning with LoRA

Cost-Effective Fine-Tuning with LoRA

Selected from Sebastian Raschka’s blog Translated by Machine Heart Editor: Jiaqi This is the experience derived from hundreds of experiments by the author Sebastian Raschka, worth reading. Increasing the amount of data and the number of model parameters is a widely recognized direct method to improve neural network performance. Currently, mainstream large models have parameter … Read more

S-LoRA: Enabling Thousands of Large Models on a GPU

S-LoRA: Enabling Thousands of Large Models on a GPU

Machine Heart reports Editor: Danjiang Generally, the deployment of large language models adopts a “pre-training – then fine-tuning” approach. However, when fine-tuning the base model for numerous tasks (such as personalized assistants), the training and service costs can become extremely high. Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method, typically used to adapt the base … Read more