Using the NPU to run inference on large models can more effectively plan resource allocation, achieving more efficient applications.
1 Preparation Before Deployment
To deploy large language models using the NPU on RK3588, the following files need to be prepared in advance, which include the corresponding files for Rockchip’s NPU and the DeepSeek inference model files.
bash
|
|
Currently, before deploying the large model, first check if the NPU driver is version 0.9.8. It is recommended to update the driver to the latest version.
bash
|
|
2 Driver Update Considerations
To update the NPU driver, recompile the kernel according to the official manual, paying attention to two points.
Compilation Error 1: Since the NPU for RK3588 and RK3576 is the same, the same driver file is used. However, the RK3588 driver does not define this structure, so simply comment out that line of code at the corresponding location.
bash
|
|
Compilation Error 2: When the kernel version is 5.10, this function is not defined. Define it according to the official manual.
bash
|
|
If it is RK3588, you can also directly find the modified driver file and copy it into the kernel driver folder for compilation.
3 Compile Kernel && Deployment
3.1 Compile the Kernel
Overwrite the above driver files into the driver folder, compile the kernel according to the manual, and burn the generated boot.img to the board.
After updating the kernel, check the NPU driver version again. If the driver version has been upgraded, you can proceed to the next step.
bash
|
|
3.2 Virtual Machine Operations for Deployment
Clone the official repository files to your local machine and install the rkllm-toolkit. It is recommended to create a new Python 3.8 or 3.10 environment using conda.
The rkllm-toolkit.whl file is located in the rknn-llm-main/rkllm-toolkit folder after cloning the repository, and currently supports Python 3.8 or Python 3.10.
bash
|
|
After installation, you can test whether the installation was successful in the Python environment.
python
|
|
Clone the model repository data, or you can download it directly.
bash
|
|
Then create data_quant.json for quantizing the rkllm model.
bash
|
|
Export the rkllm model, and you need to modify the hardware platform, model data location, and other information in the script.
bash
|
|
python
|
|
Transfer the generated rkllm file to the board, and the operations on the virtual machine can be completed. The process of exporting the rkllm model requires a large amount of memory resources, so developers should choose the operating machine wisely.
3.2 Board Operations for Deployment
First, install some tools needed for compilation.
bash
|
|
The board also needs to clone the official repository files, or you can transfer the cloned files from the virtual machine to the board.
bash
|
|
Enter the corresponding case’s deploy folder, modify the compiler, and execute the compilation.
bash
|
|
After execution, the llm_demo file will be generated in the build folder, and you need to give this file execution permissions.
bash
|
|
Copy the necessary library files to the /usr/lib directory.
bash
|
|
Finally, execute the following command to run DeepSeek.
bash
|
|
4 Running Experience
bash
|
|
Ask it if it understands Chinese.
bash
|
|
Then test its professional knowledge.
bash
|
|
The above question can be directly queried through the knowledge base. Now ask a deeper question that requires some thought.
bash
|
|
There was no deep thinking process, and the answer was quite general. Now, let’s ask a simple math question.
bash
|
|
It must be said that compared to CPU deployment, the running speed has improved significantly, but the NPU deployment process still has some difficulties.
-
Compiling the kernel may require certain modifications to the driver due to differences in kernel versions and platforms;
-
Exporting the rkllm model has certain memory requirements for the virtual machine, etc.