Model Optimization Articles

What Level is the Ascend 910 NPU and How Does it Perform in the DeepSeek Integrated Machine?

2025-10-19 by boardor

The Ascend DeepSeek integrated machine is an AI solution based on self-developed Ascend AI chips (such as Ascend 910B and 910C) deeply integrated with the DeepSeek large model, aimed at providing a high-performance, low-cost, domestically produced AI computing power platform. This article provides a detailed analysis from various dimensions including the technology, products, architecture, specifications, … Read more

TinyML Breakthrough: Deploying 1KB Models with MicroTVM on LoRa

2025-09-14 by boardor

Hey, recently I’ve been tinkering with something fun — running machine learning on those tiny IoT devices! Seeing the number “1KB”, many people shake their heads: how is that possible? Indeed, a high-definition photo takes several MB, so where’s the magic that allows AI to fit into such a tiny space? Actually, TinyML is such … Read more

Java Edge AI Inference: Deploying TensorFlow Lite on Raspberry Pi

2025-09-14 by boardor

Click the blue text to follow us Java Edge AI Inference: Deploying TensorFlow Lite on Raspberry Pi To be honest, when I first encountered edge AI, I completely went in the wrong direction. I thought that simply shrinking the model would allow it to run, but I ended up hitting a lot of pitfalls. At … Read more

LoRA-Dash: A More Efficient Method for Task-Specific Fine-Tuning

2025-06-12 by boardor

Article Link: https://arxiv.org/abs/2409.01035 Code Link: https://github.com/Chongjie-Si/Subspace-Tuning Project Homepage: https://chongjiesi.site/project/2024-lora-dash.html Due to the rich content of the LoRA-Dash paper, compressing 30 pages of content into 10 pages is a highly challenging task. Therefore, we have made careful trade-offs between readability and content integrity. The starting point of this article may differ from the original paper, aligning … Read more

Implementing Neural Networks on FPGAs

2025-05-27 by boardor

Author | Shawn Ouyang, System Architect at Ruijun Micro UK R&D Center; Dr. Andrew, Fellow at Ruijun Micro UK Research Center 1. Introduction FPGA is a device for implementing programmable digital logic. Similar to circuit architectures like CPU, GPU/NPU and dedicated ASIC, FPGAs have also begun to be widely used for implementing neural networks (NN). … Read more

Quantization and Precision Optimization of Neural Network Models in C++

2025-05-04 by boardor

1. Introduction: The Wonderful Collision of C++ and Neural Networks In today’s technological wave, neural networks are undoubtedly a shining star, driving the field of artificial intelligence forward at an astonishing pace. From accurately identifying various objects in image recognition to enabling smooth human-computer dialogue in natural language processing, and assisting doctors in detecting disease … Read more