Address:https://github.com/NotPunchnox/rkllama
RKLLama is an open-source server and client solution designed to run large language models (LLMs) optimized for the Rockchip RK3588 (S) and RK3576 platforms, and to interact with them. Unlike solutions such as Ollama or Llama.cpp, RKLLama fully utilizes the Neural Processing Units (NPU) on these devices, providing an efficient and high-performance solution for deploying artificial intelligence and deep learning models on Rockchip hardware.
RKLLama comes equipped with a REST API, allowing you to easily build custom clients tailored to specific needs. It also provides an integrated command-line interface (CLI) client, simplifying the process of testing and interacting with the API.
Based on this video:
https://www.youtube.com/watch?v=Kj8U1OGqGPc
Download
git clone https://github.com/notpunchnox/rkllama
To install, execute setup.sh in the download directory.
bash setup.sh
Successful installation display
Run rkllama to see available commands
Available commands:
- help: Displays this help menu.
- update: Checks for available updates and upgrades.
- serve: Starts the server.
- list: Lists all available models on the server.
- pull hf/model/file.rkllm: Downloads a model from a file on Hugging Face.
- rm model.rkllm: Deletes a model.
- load model.rkllm: Loads a specific model.
- unload: Unloads the currently loaded model.
- run: Enters a dialogue mode with the model.
- exit: Exits the program.
Start the service
rkllama serve
Display after successful start
In another terminal, download the model
rkllama pull
Manually download the model and place it in RKLLAMA/models. You can download the Qwen2.5 model from https://huggingface.co/c01zaut/Qwen2.5-3B-Instruct-RK3588-1.1.4/tree/main
At this point, run
rkllama list
to see the available models
Run this model
After starting, you can see
After ‘You:’, you can enter commands or prompts
Set system prompts
For more detailed instructions, refer to the GitHub readme.