Deploying Qwen 3.0 Based on RK3588

Deploying Qwen 3.0 Based on RK3588

Requires the following versionRKLLM

https://github.com/airockchip/rknn-llm/tree/release-v1.2.1b1

Install theRKLLM-Toolkit for this version, otherwise the model conversion may fail.

Download the model file, taking Qwen3-0.6B as an example

https://www.modelscope.cn/models/Qwen/Qwen3-0.6B

Modify

rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/export/export_rkllm.py

Changemodelpath to the downloadedQwen3 directory

modelpath = ‘/home/andy/rkllm/Qwen3-0.6B’

Executepython export_rkllm.py to convert to therkllm supported model format.

INFO: rkllm-toolkit version: 1.2.1b1WARNING: Cuda device not available! switch to cpu device!Building model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 427/427 [00:10<00:00, 40.60it/s]Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3041.55it/s]Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 192.92it/s]Generating train split: 19 examples [00:00, 1234.69 examples/s]Optimizing model:  96%|███████████████████████████████████████████████████████████████████████████████████████████████████▎   | 27/28 [13:38<00:                     Optimizing model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [14:06<00:                     Optimizing model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [14:06<00:                     00, 30.24s/it]INFO: Setting chat_template to "<|im_start|>user\n[content]<|im_end|>\n<|im_start|>assistant\n"INFO: Setting token_id of eos to 151645INFO: Setting token_id of pad to 151643INFO: Setting token_id of bos to 151643INFO: Setting add_bos_token to FalseConverting model: 100%|██████████████████████████████████████████████████████████████████████████████████| 311/311 [00:00<00:00, 1290235.95it/s]INFO: Setting max_context_limit to 4096INFO: Exporting the model, please wait ....[=================================================>] 597/597 (100%)INFO: Model has been saved to ./Qwen3-0.6B_W8A8_RK3588.rkllm!

Copy the generatedQwen3-0.6B_W8A8_RK3588.rkllm to theRK3588 board.

Execute the following to start the model service

rknn-llm-release-v1.2.1b1/examples/rkllm_server_demo/rkllm_server$ python flask_server.py --rkllm_model_path /home/cat/rkllm/Qwen3-0.6B_W8A8_RK3588.rkllm --target_platform rk3588

Run the client, input a question to see theLLM response. You may need to adjust theIP address inflask_server.py

rknn-llm-release-v1.2.1b1/examples/rkllm_server_demo$ python chat_api_flask.py============================Input your question in the terminal to start a conversation with the RKLLM model...============================
*Please enter your question:你是谁?Q: 你是谁?A:<think>好的,用户问我是谁。我需要先确认自己是哪个角色。根据之前的对话历史,用户可能是在测试我的回答能力或者想了解我的身份。
首先,我应该明确说明自己的身份,比如“我是AI助手”,这样既诚实又符合常规。同时,要保持友好和亲切的语气,让用户感到被理解和支持。
接下来,考虑用户的潜在需求。他们可能想知道我的功能、用途,或者如何与我互动。因此,在回答时,可以简要介绍我的能力范围,并邀请用户提出更多问题,以促进进一步交流。
最后,确保整个回应简洁明了,避免冗长,同时保持自然的口语化表达,让用户感到轻松愉快。</think>
我是AI助手,随时为您提供帮助和支持!如果您有任何问题或需要协助,请随时告诉我哦~ 😊*Please enter your question:

no_think

python chat_api_flask.py============================Input your question in the terminal to start a conversation with the RKLLM model...============================
*Please enter your question:你是谁/no_thinkQ: 你是谁/no_thinkA:<think>
</think>
我是你的虚拟助手,我是一个AI助手。你可以向我提问、学习知识或寻求帮助。有什么我可以帮你的吗?*Please enter your question:

Leave a Comment