RK3588-Based rkllm Compatible with OpenAI API

Usage:

git clone https://github.com/huonwe/rkllm_openai_like_api.gitcd rkllm_openai_like_api

Add the required dynamic libraries:

sudo cp lib/*.so /usr/lib

Installuv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Installpythondependencies:

uv sync

Run:

uv run server.py

By default, the platform is set to rk3588, the model path is models/deepseek-r1-1.5b-w8a8.rkllm, and the listening port is 8080. You can manually specify parameters, such as

uv run server.py  --rkllm_model_path=path/to/model.rkllm --target_platform=rk3588  --port=8080

You can connect to this service via:

http://your.ip:8080/rkllm_chat/v1/v1/chat/completions

You can test usingclient.py:

uv run client.py

Test code

# Example: reuse your existing OpenAI setup
from openai import OpenAI
# Point to the local server
client = OpenAI(base_url="http://192.168.1.211:8080/rkllm_chat/v1", api_key="not-needed")
completion = client.chat.completions.create(
    model="local-model",  # this field is currently unused
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Tell me a story"}
    ],
    temperature=0.7,
)
print(completion.choices[0].message.content)

Execution result:

~$ python test.py
Once upon a time, there was a brave little rabbit who lived in a beautiful forest. One day, the little rabbit heard that there was a mysterious treasure deep in the forest, so it decided to go find it. On its journey, the little rabbit met various animal friends, including a clever fox, a kind elephant, and a friendly squirrel. Together, they overcame many difficulties and finally found the treasure. When the little rabbit discovered the secret of the treasure, it realized that the true treasure was friendship and courage. It shared part of the treasure with its friends and promised to always protect the natural environment of the forest. This story teaches us that no matter what difficulties we encounter, as long as we unite and face them bravely, we can always find a way to solve problems. At the same time, friendship and courage are also the most precious treasures in life.

Modify the example to use tools

# Example: reuse your existing OpenAI setup
from openai import OpenAI
# Point to the local server
client = OpenAI(base_url="http://192.168.1.211:8080/rkllm_chat/v1", api_key="not-needed")

import openai
import requests
import json
# Set OpenAI API key
openai.api_key = "your_OpenAI_API_key"
# Define a function to get weather information
def get_weather(location):
    url = f"http://api.openweathermap.org/data/2.5/weather?q={location}&appid=your_OpenWeatherMap_API_key&units=metric"
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
        weather_description = data['weather'][0]['description']
        temperature = data['main']['temp']
        return f"The weather in {location} is {weather_description}, and the current temperature is {temperature} degrees Celsius."
    else:
        return f"Failed to get weather information for {location}."
# Define function description for OpenAI
functions = [
    {
        "name": "get_weather",
        "description": "Get the current weather information for a specified location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to query the weather for, e.g., Beijing, New York"
                }
            },
            "required": ["location"]
        }
    }
]
# Build user prompt
user_prompt = "What is the weather like in Shanghai today?"
# Call OpenAI API and enable function calling
response = client.chat.completions.create(
    model="local-model",  # this field is currently unused
    messages=[
        {"role": "user", "content": user_prompt}
    ],
    functions=functions,
    function_call="auto"  # Let the model decide whether to call the function
)
# Process API response
message = response.choices[0].message
print(message)

Run directly:

ChatCompletionMessage(content=' I am not very sure about the weather in Shanghai today, you can get the most accurate information by checking the local real-time weather forecast. If you have any other questions that need help, please let me know!', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None)

As noted in the previous article “Implementation of Function Call Based on rk3588 Large Language Model”, the input prompt needs to be adjusted. Ensure that the final prompt submitted to the large model is as follows:

<|im_start|>systemYou are a helpful assistant.
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:<tools>{"name": "get_weather", "description": "Get the current weather information for a specified location", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The location to query the weather for, e.g., Beijing, New York"}}, "required": ["location"]}}</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:<tool_call>{"name": <function-name>, "arguments": <args-json-object>} </tool_call><|im_end|><|im_start|>userWhat is the weather like in Shanghai today?<|im_end|><|im_start|>assistant

Execute again:

ChatCompletionMessage(content=' <tools>
{"name": "get_weather", "arguments": {"location": "Shanghai"}}
</tools>', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None)

Note:

Qwen uses the following format (ChatML dialogue markup language) to format each round of dialogue using control tokens.

<|im_start|>{{role}}{{content}}<|im_end|>

User input plays the role of user, while the model generation takes on the role of assistant.

A complete example

<|im_start|>systemYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|><|im_start|>userhello<|im_end|><|im_start|>assistantHello! How can I assist you today? Let me know if you have any questions or need help with something. I'm here to help!<|im_end|><|im_start|>userExplain large language models like I'm 5.<|im_end|><|im_start|>assistantSure, I'd be happy to explain large language models in a simple way! Imagine that you have a friend who is really good at playing with words. This friend has read so many books and talked to so many people that they know how to use words in all sorts of different ways. They can tell stories, answer questions, and even make up new words if they need to. Now imagine that this friend is actually a computer program, called a "large language model". It's been trained on lots and lots of text, like books, articles, and conversations, so it knows how to use words just like your word-loving friend does. When you ask the model a question or give it a task, it uses all the knowledge it's learned to come up with a response that makes sense. Just like your friend might sometimes make mistakes or say things in a funny way, the large language model isn't perfect either. But it's still really impressive, because it can understand and generate human language in a way that was once thought impossible for machines!<|im_end|><|endoftext|>

Related posts

Leave a Comment Cancel reply