1. Objective of This Article

Adapt the YOLOv8-Pose example for the Android platform, providing functionality for pose recognition after selecting an image.
Learn the source code and RKNN API through the project.

2. Development Environment Description

Host System: Windows 11
Target Device: Android development board equipped with RK3588 chip
Core Tools: Android Studio Koala | 2024.1.1 Patch 2, NDK 27.0

3. Adapting (Migrating) to Android

With the experience from the previous two migrations, this one went smoothly. You can refer to the previous three articles; if you encounter any issues (or need the source code), leave me a message. For the YOLOv8-Pose C language example, please refer to the previous blog post “Using the RK3588 Chip NPU: Compiling YOLOv8-Pose C Demo in Windows 11 Docker and Running on Development Board”. For knowledge related to porting the C Demo to the Android application side, please refer to the blog post “Step-by-Step Deployment of YOLOv5 to RK3588 Android: NPU Acceleration and JNI/C/Kotlin Interface Development Guide”. For the last migration, refer to “Using the RK3588 Chip NPU: Deploying PPOCRv4 Example on Android System”, which addresses image format issues, which is very important.

4. Important Source Code Analysis

4.1 init_yolov8_pose_model Method

The main task of this function is to initialize the YOLOv8 model under the RKNN framework, query attributes, and save configurations. The function source code is as follows:

int init_yolov8_pose_model(const char *model_path, rknn_app_context_t *app_ctx)
{
    int ret;
    // 1. Initialize RKNN context
    rknn_context ctx = 0;
    ret = rknn_init(&ctx, (char *)model_path, 0, 0, NULL);
    if (ret < 0)
    {
        printf("rknn_init fail! ret=%d\n", ret);
        return -1;
    }

    // 2. Query the number of model inputs and outputs
    rknn_input_output_num io_num;
    ret = rknn_query(ctx, RKNN_QUERY_IN_OUT_NUM, &io_num, sizeof(io_num));
    if (ret != RKNN_SUCC)
    {
        printf("rknn_query fail! ret=%d\n", ret);
        return -1;
    }
    printf("model input num: %d, output num: %d\n", io_num.n_input, io_num.n_output);

    // 3. Get input tensor attributes
    printf("input tensors:\n");
    rknn_tensor_attr input_attrs[io_num.n_input];
    memset(input_attrs, 0, sizeof(input_attrs));
    for (int i = 0; i < io_num.n_input; i++)
    {
        input_attrs[i].index = i;
        ret = rknn_query(ctx, RKNN_QUERY_INPUT_ATTR, &(input_attrs[i]), sizeof(rknn_tensor_attr));
        if (ret != RKNN_SUCC)
        {
            printf("rknn_query fail! ret=%d\n", ret);
            return -1;
        }
        dump_tensor_attr(&(input_attrs[i]));
    }

    // 4. Get output tensor attributes
    printf("output tensors:\n");
    rknn_tensor_attr output_attrs[io_num.n_output];
    memset(output_attrs, 0, sizeof(output_attrs));
    for (int i = 0; i < io_num.n_output; i++)
    {
        output_attrs[i].index = i;
        ret = rknn_query(ctx, RKNN_QUERY_OUTPUT_ATTR, &(output_attrs[i]), sizeof(rknn_tensor_attr));
        if (ret != RKNN_SUCC)
        {
            printf("rknn_query fail! ret=%d\n", ret);
            return -1;
        }
        dump_tensor_attr(&(output_attrs[i]));
    }

    // 5. Save configuration to application context
    app_ctx->rknn_ctx = ctx;
    // 6. Check if the model is quantized: check if the first output tensor is of non-FP16 affine quantization type, set is_quant flag for subsequent dequantization processing.
    if (output_attrs[0].qnt_type == RKNN_TENSOR_QNT_AFFINE_ASYMMETRIC && output_attrs[0].type != RKNN_TENSOR_FLOAT16)
    {
        app_ctx->is_quant = true;
    }
    else
    {
        app_ctx->is_quant = false;
    }
    // 7. Copy input and output attributes to application context: dynamically allocate memory and copy input and output attributes, saving to app_ctx for later access.
    app_ctx->io_num = io_num;
    app_ctx->input_attrs = (rknn_tensor_attr *)malloc(io_num.n_input * sizeof(rknn_tensor_attr));
    memcpy(app_ctx->input_attrs, input_attrs, io_num.n_input * sizeof(rknn_tensor_attr));
    app_ctx->output_attrs = (rknn_tensor_attr *)malloc(io_num.n_output * sizeof(rknn_tensor_attr));
    memcpy(app_ctx->output_attrs, output_attrs, io_num.n_output * sizeof(rknn_tensor_attr));
    // 8. Parse input tensor dimensions
    if (input_attrs[0].fmt == RKNN_TENSOR_NCHW)
    {
        printf("model is NCHW input fmt\n");
        app_ctx->model_channel = input_attrs[0].dims[1];
        app_ctx->model_height = input_attrs[0].dims[2];
        app_ctx->model_width = input_attrs[0].dims[3];
    }
    else
    {
        printf("model is NHWC input fmt\n");
        app_ctx->model_height = input_attrs[0].dims[1];
        app_ctx->model_width = input_attrs[0].dims[2];
        app_ctx->model_channel = input_attrs[0].dims[3];
    }
    printf("model input height=%d, width=%d, channel=%d\n",
           app_ctx->model_height, app_ctx->model_width, app_ctx->model_channel);

    return 0;
}

4.1.1 rknn_init Initialization

The rknn_init function is responsible for creating the rknn_context object, loading the RKNN model, and executing specific initialization behaviors based on the flag and rknn_init_extend structure.Function Prototype:

int rknn_init(
    rknn_context* context,      // Output parameter: returned RKNN context handle
    void* model,                // Input parameter: model data or model file path
    uint32_t size,              // Input parameter: size of model data (in bytes)
    uint32_t flag,              // Input parameter: initialization flag (extension options)
    rknn_init_extend* extend    // Input parameter: extended initialization information (optional)
);

In this example, the first two parameters are used:

rknn_context ctx = 0;
ret = rknn_init(&ctx, (char *)model_path, 0, 0, NULL);

4.1.2 Querying the Number of Model Inputs and Outputs, Input Tensor Attributes, and Output Tensor Attributes

rknn_query function can query and obtain model input and output information, layer execution time, total model inference time, SDK version, memory usage information, user-defined strings, and more. Function prototype:

int rknn_query(
    rknn_context context,       // Input parameter: RKNN context handle (returned by rknn_init)
    rknn_query_cmd cmd,         // Input parameter: query command type (enumeration value)
    void* info,                 // Output parameter: pointer to buffer storing query results
    uint32_t size               // Input parameter: size of info buffer (in bytes)
);

The SDK supports many query commands; refer to the official documentation: 04_Rockchip_RKNPU_API_Reference_RKNNRT_V2.3.0_CN.pdf. This function involves three query commands:

Query Command	Returned Result Structure	Functionality
`<span>RKNN_QUERY_IN_OUT_NUM</span>`	`<span>rknn_input_output_num</span>`	Query the number of input and output tensors
`<span>RKNN_QUERY_INPUT_ATTR</span>`	`<span>rknn_tensor_attr</span>`	Query input tensor attributes
`<span>RKNN_QUERY_OUTPUT_ATTR</span>`	`<span>rknn_tensor_attr</span>`	Query output tensor attributes

Two important structure descriptions: rknn_input_output_num: Used for the RKNN_QUERY_IN_OUT_NUM command return result, storing the number of model input/output tensors.

typedef struct _rknn_input_output_num {
    uint32_t n_input;   // Number of input tensors
    uint32_t n_output;  // Number of output tensors
} rknn_input_output_num;

rknn_tensor_attr: Used for RKNN_QUERY_INPUT_ATTR and RKNN_QUERY_OUTPUT_ATTR commands, describing tensor attributes.

#define RKNN_MAX_DIMS                           16      /* maximum dimension of tensor. */
#define RKNN_MAX_NAME_LEN                       256     /* maximum name length of tensor. */
typedef struct _rknn_tensor_attr {
    /* Basic Information */
    uint32_t index;         // Input parameter: specify the index of the input/output tensor to query (must be set before calling)
    uint32_t n_dims;        // Number of dimensions of the tensor
    uint32_t dims[RKNN_MAX_DIMS]; // Dimension array (e.g., [1, 3, 224, 224])
    char name[RKNN_MAX_NAME_LEN]; // Tensor name

    /* Data Description */
    uint32_t n_elems;       // Total number of elements (product of all dimensions)
    uint32_t size;          // Tensor byte size
    rknn_tensor_format fmt; // Data format (e.g., NCHW/NHWC)
    rknn_tensor_type type;  // Data type (e.g., FP32/INT8)

    /* Quantization Information */
    rknn_tensor_qnt_type qnt_type; // Quantization type (e.g., asymmetric/dynamic fixed-point)
    int8_t fl;              // Length of fractional bits for dynamic fixed-point (valid when RKNN_TENSOR_QNT_DFP)
    int32_t zp;             // Zero-point offset (valid for asymmetric quantization)
    float scale;            // Scale factor (valid for asymmetric quantization)

    /* Memory Layout */
    uint32_t w_stride;      // Width stride (read-only, 0 means equal to width)
    uint32_t size_with_stride; // Total byte size including stride
    uint8_t pass_through;   // Pass-through mode (TRUE means data is input directly without conversion)
    uint32_t h_stride;      // Height stride (writable, 0 means equal to height)
} rknn_tensor_attr;

<span>rknn_tensor_qnt_type</span>: Defines the quantization type of the tensor, used to describe the data storage and computation method in the model. See the above source code comments, 6. Check if the model is quantized.

Enumeration Value	Name	Explanation
`<span>RKNN_TENSOR_QNT_NONE</span>`	Non-quantized	Data is not quantized, usually in floating-point format (e.g., FP32, FP16).
`<span>RKNN_TENSOR_QNT_DFP</span>`	Dynamic Fixed-Point Quantization	Uses dynamic fractional bits (`<span>fl</span>` parameter) to represent data, suitable for low-precision fixed-point computation.
`<span>RKNN_TENSOR_QNT_AFFINE_ASYMMETRIC</span>`	Asymmetric Affine Quantization	Common INT8 quantization method, using `<span>scale</span>` (scale factor) and `<span>zp</span>` (zero-point offset) for linear mapping.
`<span>RKNN_TENSOR_QNT_MAX</span>`	Enumeration Boundary	Used only to mark the enumeration range, with no practical use.

4.1.3 Parsing Input Tensor Dimensions

Based on the layout of the input tensor (NCHW/NHWC), extract the model’s height, width, and number of channels for image preprocessing.

    // 8. Parse input tensor dimensions
    if (input_attrs[0].fmt == RKNN_TENSOR_NCHW)
    {
        printf("model is NCHW input fmt\n");
        app_ctx->model_channel = input_attrs[0].dims[1];
        app_ctx->model_height = input_attrs[0].dims[2];
        app_ctx->model_width = input_attrs[0].dims[3];
    } else {
        printf("model is NHWC input fmt\n");
        app_ctx->model_height = input_attrs[0].dims[1];
        app_ctx->model_width = input_attrs[0].dims[2];
        app_ctx->model_channel = input_attrs[0].dims[3];
    }

rknn_tensor_format fmt description: Defines the memory layout format of the tensor, affecting the storage order and computational efficiency of data in memory.

typedef enum _rknn_tensor_format {
    RKNN_TENSOR_NCHW = 0,                               /* data format is NCHW. */
    RKNN_TENSOR_NHWC,                                   /* data format is NHWC. */
    RKNN_TENSOR_NC1HWC2,                                /* data format is NC1HWC2. */
    RKNN_TENSOR_UNDEFINED,

    RKNN_TENSOR_FORMAT_MAX
} rknn_tensor_format;

Enumeration Value	Name	Explanation	Applicable Scenarios
`<span>RKNN_TENSOR_NCHW</span>`	NCHW Format	Data is stored in the order of `<span>[Batch, Channel, Height, Width]</span>`.	Traditional CNN frameworks (e.g., PyTorch)
`<span>RKNN_TENSOR_NHWC</span>`	NHWC Format	Data is stored in the order of `<span>[Batch, Height, Width, Channel]</span>`.	TensorFlow, TFLite
`<span>RKNN_TENSOR_NC1HWC2</span>`	NC1HWC2 Format	Special block format for NPU hardware acceleration	Huawei Ascend, Rockchip NPU
`<span>RKNN_TENSOR_UNDEFINED</span>`	Undefined Format	Format is unspecified, may be inferred automatically by the system	Compatibility reserve option
`<span>RKNN_TENSOR_FORMAT_MAX</span>`	Enumeration Boundary	Used only to mark the enumeration range	No practical use

4.2 inference_yolov8_pose_model Method

This method executes the inference process of the YOLOv8 pose detection model, including input preprocessing, model inference, post-processing (decoding/coordinate transformation/NMS), and saving results, ultimately outputting detected human pose information (including key points).

int inference_yolov8_pose_model(rknn_app_context_t *app_ctx, image_buffer_t *img, object_detect_result_list *od_results)
{
    int ret;
    image_buffer_t dst_img;
    letterbox_t letter_box;
    rknn_input inputs[app_ctx->io_num.n_input];
    rknn_output outputs[app_ctx->io_num.n_output];
    const float nms_threshold = NMS_THRESH;      // Default NMS threshold
    const float box_conf_threshold = BOX_THRESH; // Default box threshold
    int bg_color = 114;
    // 1. Input validation and initialization
    if ((!app_ctx) || !(img) || (!od_results))
    {
        return -1;
    }

    memset(od_results, 0x00, sizeof(*od_results));
    memset(&letter_box, 0, sizeof(letterbox_t));
    memset(&dst_img, 0, sizeof(image_buffer_t));
    memset(inputs, 0, sizeof(inputs));
    memset(outputs, 0, sizeof(outputs));

    // 2. Image preprocessing (Letterbox)
    dst_img.width = app_ctx->model_width;
    dst_img.height = app_ctx->model_height;
    dst_img.format = IMAGE_FORMAT_RGB888;
    dst_img.size = get_image_size(&dst_img);
    dst_img.virt_addr = (unsigned char *)malloc(dst_img.size);
    if (dst_img.virt_addr == NULL)
    {
        printf("malloc buffer size:%d fail!\n", dst_img.size);
        goto out;
    }

    // 3. Execute Letterbox scaling and padding
    ret = convert_image_with_letterbox(img, &dst_img, &letter_box, bg_color);
    if (ret < 0)
    {
        printf("convert_image_with_letterbox fail! ret=%d\n", ret);
        goto out;
    }
    // 4. Model input settings: A key step in feeding data into the RKNN model for inference, ensuring input data is passed to the NPU hardware in the correct format, size, and type, directly affecting the accuracy and performance of inference results.
    inputs[0].index = 0;
    inputs[0].type = RKNN_TENSOR_UINT8;
    inputs[0].fmt = RKNN_TENSOR_NHWC;
    inputs[0].size = app_ctx->model_width * app_ctx->model_height * app_ctx->model_channel;
    inputs[0].buf = dst_img.virt_addr;
    ret = rknn_inputs_set(app_ctx->rknn_ctx, app_ctx->io_num.n_input, inputs);
    if (ret < 0)
    {
        printf("rknn_input_set fail! ret=%d\n", ret);
        goto out;
    }

    // 5. Model inference and time statistics
    printf("rknn_run\n");
    int start_us,end_us;
    start_us = getCurrentTimeUs();
    ret = rknn_run(app_ctx->rknn_ctx, nullptr);
    end_us = getCurrentTimeUs() - start_us;
    printf("rknn_run time=%.2fms, FPS = %.2f\n",end_us / 1000.f, 
            1000.f * 1000.f / end_us);

    if (ret < 0)
    {
        printf("rknn_run fail! ret=%d\n", ret);
        goto out;
    }

    // 6. Output data retrieval
    memset(outputs, 0, sizeof(outputs));
    for (int i = 0; i < app_ctx->io_num.n_output; i++)
    {
        outputs[i].index = i;
        outputs[i].want_float = (!app_ctx->is_quant);
    }
    ret = rknn_outputs_get(app_ctx->rknn_ctx, app_ctx->io_num.n_output, outputs, NULL);
    if (ret < 0)
    {
        printf("rknn_outputs_get fail! ret=%d\n", ret);
        goto out;
    }
    // 7. Post-processing
    start_us = getCurrentTimeUs();
    post_process(app_ctx, outputs, &letter_box, box_conf_threshold, nms_threshold, od_results);
    end_us = getCurrentTimeUs() - start_us;
    printf("post_process time=%.2fms, FPS = %.2f\n",end_us / 1000.f, 
            1000.f * 1000.f / end_us);
    // 8. Resource release
    rknn_outputs_release(app_ctx->rknn_ctx, app_ctx->io_num.n_output, outputs);
out:
    if (dst_img.virt_addr != NULL)
    {
        free(dst_img.virt_addr);
    }

    return ret;
}

4.2.1 rknn_inputs_set Model Input Settings

The rknn_inputs_set function can set the model’s input data. This function supports multiple inputs, where each input is an rknn_input structure object. Function prototype:

int rknn_inputs_set(
    rknn_context context,       // Input parameter: RKNN context handle (returned by rknn_init)
    uint32_t n_inputs,          // Input parameter: number of inputs to set
    rknn_input inputs[]         // Input parameter: array of input data information (each element corresponds to one input)
);

rknn_input structure: Describes the data information and transmission method of the input tensor.

typedef struct _rknn_input {
    /* Required parameters */
    uint32_t index;         // Input parameter: specify the index of the input tensor to set (starting from 0)
    void* buf;              // Input parameter: pointer to input data buffer
    uint32_t size;          // Input parameter: buffer byte size

    /* Data conversion control */
    uint8_t pass_through;   // Input parameter: pass-through mode switch
                            // TRUE: data is directly transmitted without format conversion (must ensure format completely matches the model)
                            // FALSE: automatically converted based on type and fmt below (default recommended)

    /* Must be set when pass_through=FALSE */
    rknn_tensor_type type;  // Input parameter: input data type (e.g., RKNN_TENSOR_FLOAT32)
    rknn_tensor_format fmt; // Input parameter: data memory layout (e.g., RKNN_TENSOR_NCHW)
} rknn_input;

4.2.2 rknn_run Model Inference

Execute RKNN model inference (forward computation), triggering the NPU to process the set input data and generate output results. There is not much to say; after the input is set, execute the model inference.

4.2.3 rknn_outputs_get

Retrieve the output data of the model inference. Function prototype:

int rknn_outputs_get(
    rknn_context context,           // Input parameter: RKNN context handle (obtained from rknn_init)
    uint32_t n_outputs,             // Input parameter: number of output tensors to retrieve
    rknn_output outputs[],          // Output parameter: array of output result buffer
    rknn_output_extend* extend      // Input parameter: output extended information (optional, usually NULL)
);

rknn_output structure: Used to configure and receive RKNN model output data.

typedef struct _rknn_output {
    /* Required parameters */
    uint32_t index;         // Input parameter: specify the index of the output tensor to retrieve (starting from 0)
    
    /* Output data control */
    uint8_t want_float;     // Input parameter: output format control
                            // TRUE: force conversion to float (FP32)
                            // FALSE: keep original output format (e.g., INT8)
    
    /* Memory management mode */
    uint8_t is_prealloc;    // Input parameter: memory allocation method
                            // TRUE: user pre-allocates memory (must set buf and size below)
                            // FALSE: memory is automatically allocated by SDK (default recommended)
    
    /* Must be set when is_prealloc=TRUE */
    void* buf;              // Input parameter: pointer to output data buffer (user pre-allocated)
    uint32_t size;          // Input parameter: buffer byte size
} rknn_output;

For storing output data, two methods can be used: one is for the user to apply and release memory themselves, in which case the rknn_output object’s is_prealloc needs to be set to 1, and the buf pointer should point to the user-allocated buffer; the other is for rknn to allocate memory, in which case the rknn_output object’s is_prealloc should be set to 0, and after the function executes, buf will point to the output data.

4.3 post_process Post-Processing

This method performs post-processing on the detection model’s output, including decoding detection boxes, confidence filtering, non-maximum suppression (NMS), and key point coordinate mapping, ultimately outputting human detection boxes and corresponding key point information, encapsulated in object_detect_result_list. The following is the source code, with comments added as much as possible.

int post_process(rknn_app_context_t *app_ctx, void *outputs, letterbox_t *letter_box, float conf_threshold, float nms_threshold,
                 object_detect_result_list *od_results) {

    rknn_output *_outputs = (rknn_output *)outputs;

    std::vector<float> filterBoxes;
    std::vector<float> objProbs;
    std::vector<int> classId;
    int validCount = 0;
    int stride = 0;
    int grid_h = 0;
    int grid_w = 0;
    int model_in_w = app_ctx->model_width;
    int model_in_h = app_ctx->model_height;
    memset(od_results, 0, sizeof(object_detect_result_list));
    int index = 0;
    // Multi-scale output layer processing
    for (int i = 0; i < 3; i++) { // Iterate over 3 output layers (YOLO multi-scale detection)
        // Get feature map dimensions 
        grid_h = app_ctx->output_attrs[i].dims[2];
        grid_w = app_ctx->output_attrs[i].dims[3];
        stride = model_in_h / grid_h; // Calculate stride (downsampling factor)
        // Quantized model processing (UINT8/INT8)
        if (app_ctx->is_quant) {
            validCount += process_i8((int8_t *)_outputs[i].buf, grid_h, grid_w, stride, filterBoxes, objProbs,
                                     classId, conf_threshold, app_ctx->output_attrs[i].zp, app_ctx->output_attrs[i].scale,index);
        }
        else
        {
            validCount += process_fp32((float *)_outputs[i].buf, grid_h, grid_w, stride, filterBoxes, objProbs,
                                     classId, conf_threshold, app_ctx->output_attrs[i].zp, app_ctx->output_attrs[i].scale, index);
        }
        index += grid_h * grid_w;
    }

    // no object detect
    if (validCount <= 0) {
        return 0;
    }
    std::vector<int> indexArray;
    for (int i = 0; i < validCount; ++i) {
        indexArray.push_back(i);
    }
    // Sort by confidence in descending order (quick sort)
    quick_sort_indice_inverse(objProbs, 0, validCount - 1, indexArray);
    // Group by class for NMS
    std::set<int> class_set(std::begin(classId), std::end(classId));
    for (auto c : class_set) {
        nms(validCount, filterBoxes, classId, indexArray, c, nms_threshold);
    }

    int last_count = 0;
    od_results->count = 0;
    // Key point coordinate mapping, iterate over all valid detection results
    for (int i = 0; i < validCount; ++i) {
        // Skip invalid results or exceed maximum number limit
        if (indexArray[i] == -1 || last_count >= OBJ_NUMB_MAX_SIZE) {
            continue;
        }
        int n = indexArray[i];
        // Map box coordinates back to the original image (Letterbox inverse transformation)
        float x1 = filterBoxes[n * 5 + 0] - letter_box->x_pad;
        float y1 = filterBoxes[n * 5 + 1] - letter_box->y_pad;
        float w = filterBoxes[n * 5 + 2];
        float h = filterBoxes[n * 5 + 3];
        int keypoints_index = (int)filterBoxes[n * 5 + 4];
        // Key point processing (17 key points)
        for (int j = 0; j < 17; ++j) {
            // Quantized model dequantization
            if (app_ctx->is_quant) {
                od_results->results[last_count].keypoints[j][0] = ((float)((rknpu2::float16 *)_outputs[3].buf)[j*3*8400+0*8400+keypoints_index] 
                                                                - letter_box->x_pad)/ letter_box->scale;
                od_results->results[last_count].keypoints[j][1] = ((float)((rknpu2::float16 *)_outputs[3].buf)[j*3*8400+1*8400+keypoints_index] 
                                                                    - letter_box->y_pad)/ letter_box->scale;
                od_results->results[last_count].keypoints[j][2] = (float)((rknpu2::float16 *)_outputs[3].buf)[j*3*8400+2*8400+keypoints_index];
            }
            else
            {
                // Floating-point model direct conversion
                od_results->results[last_count].keypoints[j][0] = (((float *)_outputs[3].buf)[j*3*8400+0*8400+keypoints_index] 
                                                                - letter_box->x_pad)/ letter_box->scale;
                od_results->results[last_count].keypoints[j][1] = (((float *)_outputs[3].buf)[j*3*8400+1*8400+keypoints_index] 
                                                                    - letter_box->y_pad)/ letter_box->scale;
                od_results->results[last_count].keypoints[j][2] = ((float *)_outputs[3].buf)[j*3*8400+2*8400+keypoints_index];
            }
        }

        int id = classId[n];
        float obj_conf = objProbs[i];
        // Save results
        od_results->results[last_count].box.left = (int)(clamp(x1, 0, model_in_w) / letter_box->scale);
        od_results->results[last_count].box.top = (int)(clamp(y1, 0, model_in_h) / letter_box->scale);
        od_results->results[last_count].box.right = (int)(clamp(x1+w, 0, model_in_w) / letter_box->scale);
        od_results->results[last_count].box.bottom = (int)(clamp(y1+h, 0, model_in_h) / letter_box->scale);
        od_results->results[last_count].prop = obj_conf; // Confidence
        od_results->results[last_count].cls_id = id; // Class
        last_count++;
    }
    od_results->count = last_count;
    return 0;
}

Parameter Description

Parameter Name	Type	Description
app_ctx	Structure	Model context, containing output tensor attributes (dimensions, quantization parameters, etc.)
outputs	Array/Tensor	Raw output data from the model (multi-scale feature maps)
letter_box	Structure	Letterbox parameters generated during preprocessing (scaling ratio, padding size)
conf_threshold	float	Confidence filtering threshold (for filtering detection boxes)
nms_threshold	float	NMS (Non-Maximum Suppression) overlap threshold
od_results	Structure Array	Output parameter, storing final detection results (including class, confidence, coordinates, etc.)

4.4 After Obtaining od_results (Result List), Draw Boxes, Poses, and Text

4.4.1 First Understand a Few Structures

#define OBJ_NUMB_MAX_SIZE 128
typedef struct {
    int left; // x coordinate of the top-left corner of the rectangle (in pixels)
    int top;
    int right;
    int bottom;
} image_rect_t;

typedef struct {
    image_rect_t box; // Coordinates of the target's bounding box
    float keypoints[17][3];// Coordinates and confidence of 17 key points, each row formatted as [x, y, confidence], key points usually correspond to COCO dataset joints (e.g., nose, left eye, right shoulder, etc.).
    float prop; // Confidence of the detection result
    int cls_id; // Class ID
} object_detect_result;

typedef struct {
    int id; // Identifier for the frame or scene
    int count; // Actual number of detected targets
    object_detect_result results[OBJ_NUMB_MAX_SIZE]; // Array storing all detection results, maximum capacity 128
} object_detect_result_list;

The source code is as follows, with interpretations in the comments.

// Draw boxes and probabilities
char text[256];
// Iterate over all detection results
for (int i = 0; i < od_results.count; i++)
{
    object_detect_result *det_result = &(od_results.results[i]);
    LOGI("%s @ (%d %d %d %d) %.3f\n", coco_cls_to_name(det_result->cls_id),
            det_result->box.left, det_result->box.top,
            det_result->box.right, det_result->box.bottom,
            det_result->prop);
    // Draw the bounding box for the human
    int x1 = det_result->box.left;
    int y1 = det_result->box.top;
    int x2 = det_result->box.right;
    int y2 = det_result->box.bottom;
    draw_rectangle(&dst_image, x1, y1, x2 - x1, y2 - y1, COLOR_BLUE, 3);
    // Label category and confidence
    sprintf(text, "%s %.1f%%", coco_cls_to_name(det_result->cls_id), det_result->prop * 100);
    draw_text(&dst_image, text, x1, y1 - 20, COLOR_RED, 10);
    // Draw skeleton lines (pose estimation)
    for (int j = 0; j < 38/2; ++j) {
        draw_line(&dst_image, (int)(det_result->keypoints[skeleton[2*j]-1][0]),(int)(det_result->keypoints[skeleton[2*j]-1][1]),
                    (int)(det_result->keypoints[skeleton[2*j+1]-1][0]),(int)(det_result->keypoints[skeleton[2*j+1]-1][1]),COLOR_ORANGE,3);
    }
    // Draw key point circles
    for (int j = 0; j < 17; ++j) {
        draw_circle(&dst_image, (int)(det_result->keypoints[j][0]),(int)(det_result->keypoints[j][1]),1, COLOR_YELLOW,1);
    }
}

5. Conclusion

Learning the RKNN API through actual projects is quite efficient; the more you use it, the more you master it. A novice will eventually become an expert.

Using the RK3588 Chip NPU: YOLOv8-Pose Example for Image Detection Deployment on Android System and In-Depth Source Code Analysis (RKNN API)

1. Objective of This Article

2. Development Environment Description

3. Adapting (Migrating) to Android

4. Important Source Code Analysis

4.1 init_yolov8_pose_model Method

4.1.1 rknn_init Initialization

4.1.2 Querying the Number of Model Inputs and Outputs, Input Tensor Attributes, and Output Tensor Attributes

4.1.3 Parsing Input Tensor Dimensions

4.2 inference_yolov8_pose_model Method

4.2.1 rknn_inputs_set Model Input Settings

4.2.2 rknn_run Model Inference

4.2.3 rknn_outputs_get

4.3 post_process Post-Processing

4.4 After Obtaining od_results (Result List), Draw Boxes, Poses, and Text

4.4.1 First Understand a Few Structures

5. Conclusion

Leave a Comment Cancel reply

1. Objective of This Article

2. Development Environment Description

3. Adapting (Migrating) to Android

4. Important Source Code Analysis

4.1 init_yolov8_pose_model Method

4.1.1 rknn_init Initialization

4.1.2 Querying the Number of Model Inputs and Outputs, Input Tensor Attributes, and Output Tensor Attributes

4.1.3 Parsing Input Tensor Dimensions

4.2 inference_yolov8_pose_model Method

4.2.1 rknn_inputs_set Model Input Settings

4.2.2 rknn_run Model Inference

4.2.3 rknn_outputs_get

4.3 post_process Post-Processing

4.4 After Obtaining od_results (Result List), Draw Boxes, Poses, and Text

4.4.1 First Understand a Few Structures

5. Conclusion

Related posts

Leave a Comment Cancel reply