Developing AI Inference Applications Based on Orange Pi AIpro

The Orange Pi AIpro development board adopts the Ascend AI technology route, with rich interfaces and powerful scalability, providing 8/20 TOPS of computing power, and can be widely used in AI edge computing, deep visual learning, video stream AI analysis, video image analysis, natural language processing, and other AI fields. Through the AI programming interface of the Ascend CANN software stack, it meets the needs of most AI algorithm prototype verification and inference application development.

AscendCL (Ascend Computing Language) is an open programming framework for Ascend computing, encapsulating the underlying Ascend computing service interface, providing APIs for Device management, Context management, Stream management, memory management, model loading and execution, operator loading and execution, media data processing, etc., supporting C&C++, Python programming languages, and capable of performing deep learning inference calculations, graphics and image preprocessing, single operator acceleration calculations, and other capabilities.

Mastering the programming method of AscendCL means being able to fully utilize the computing resources of Ascend on the Orange Pi AIpro development board, and develop a series of deep learning inference calculation programs based on deep learning algorithms such as image classification and object detection.

Overall Process

When developing inference applications using AscendCL, the development process is roughly divided into the following steps:

AscendCL Initialization: Initialize AscendCL internal resources to prepare for program execution
Runtime Resource Application: Apply for runtime-related resources, such as computing devices
Media Data Processing: Can achieve image cropping, scaling, video or image encoding and decoding, etc.
Model Inference: Including model loading, execution, unloading
Runtime Resource Release: Release resources in a timely manner after use
AscendCL De-initialization: Used in conjunction with initialization

Before we start, we need to understand what is often mentioned when using AscendCL, the “data type operation interface”. What is it? Why does it exist?

In C/C++, the data types open to users are usually defined in Struct structure and used in the way of declaring variables, but once the structure needs to add member parameters, the user’s code will involve compatibility issues, which is inconvenient for maintenance. Therefore, AscendCL opens the data types to users, and operates on these data types in the form of interfaces, such as calling the Create interface of a certain data type to create that data type, calling the Get interface to obtain the parameter values within the data type, calling the Set interface to set the parameter values within the data type, and calling the Destroy interface to destroy that data type. Users do not need to care about how the structure of data types is defined, so even if the data type needs to be expanded later, it only needs to add the operation interface of that data type, which will not cause compatibility issues.

So, to summarize, the “data type operation interface” is a series of interfaces for creating data types, getting/setting parameter values in data types, and destroying data types, and the biggest benefit of its existence is to reduce compatibility issues.

Next, let’s move on to our topic today, how to develop applications in the network model inference scenario using the interfaces of AscendCL.

AscendCL Initialization and De-initialization

When developing applications using AscendCL interfaces, AscendCL must be initialized first; otherwise, it may lead to errors in the subsequent system internal resource initialization, which in turn leads to other business exceptions. During initialization, the following inference-related configuration items (for example, performance-related collection information configuration) are supported, and can be passed to the AscendCL initialization interface in JSON format configuration files. If the current default configuration meets the requirements (for example, the default does not enable performance-related collection information configuration), there is no need to modify it, and NULL can be passed to the AscendCL initialization interface, or the configuration file can be set to an empty JSON string (that is, the configuration file only contains {}).

With initialization, there is also de-initialization. After confirming that all calls to AscendCL have been completed, or before the process exits, the AscendCL interface needs to be called to implement AscendCL de-initialization.

Runtime Resource Application and Release

Runtime management resources include Device, Context, Stream, Event, etc. Here, we focus on Device, Context, Stream, whose basic concepts are shown in the figure below.

You need to apply for the following runtime management resources in order: Device, Context, Stream, to ensure that you can use these resources to perform calculations and manage tasks. After all data processing is completed, you need to release the runtime management resources in order: Stream, Context, Device.

When applying for runtime management resources, Context and Stream support both implicit and explicit creation methods:

Media Data Processing

If the model’s requirements for the width and height of the input image do not match the source image provided by the user, AscendCL provides interfaces for media data processing, which can achieve cropping, scaling, format conversion, video or image encoding and decoding, etc., to crop the source image to meet the model’s requirements. This feature will be elaborated later, and this issue focuses on the model inference part, taking the input image meeting the model’s requirements as an example.

Model Loading

In model inference scenarios, an offline model (*.om file) compatible with the Ascend AI processor must be available. We can use ATC (Ascend Tensor Compiler) to build the model. If the model inference involves dynamic Batch, dynamic resolution, and other characteristics, relevant configurations need to be added when building the model. For information on how to use ATC to build models, please refer to the “Ascend Community Documentation Center” at the end.

With the model, loading can begin. Currently, AscendCL supports the following methods for loading models:

Load model data from *.om files, managed by AscendCL
Load model data from *.om files, managed by the user
Load model data from memory, managed by AscendCL
Load model data from memory, managed by the user

When managed by the user, attention must be paid to working memory and weight memory. Working memory is used to store temporary data during model execution, while weight memory is used to store weight data. At this point, you may wonder how to know the size of working memory and weight memory required? Don’t worry, AscendCL not only provides the interface for loading models but also provides the interface to “obtain the size of working memory and weight memory required during model execution based on the model file”, making it convenient for users to use.

Model Execution

When calling the AscendCL interface for model inference, model inference has input and output data, which need to be stored according to the data types specified by AscendCL. The relevant data types are as follows:

Use aclmdlDesc type data to describe the basic information of the model (such as the number of inputs/outputs, names, data types, Format, dimensional information, etc.).

After the model is loaded successfully, the user can call the operation interface under this data type according to the model’s ID to obtain the description information of the model, and then obtain the number of model inputs/outputs, memory size, dimensional information, Format, data type, and other information from the model’s description information.
Use aclDataBuffer type data to describe the memory address and memory size of each input/output. Call the operation interface under the aclDataBuffer type to obtain memory addresses, memory sizes, etc., to facilitate storing input data in memory and obtaining output data.
Use aclmdlDataset type data to describe the input/output data of the model.

The model may have multiple inputs and outputs, and the operation interface of aclmdlDataset type can be called to add multiple aclDataBuffer type data.

After preparing the required input and output data types for model execution, and storing the input data for model execution, model inference can be executed. If the model input involves dynamic Batch, dynamic resolution, and other characteristics, you also need to call the AscendCL interface before model execution to inform the model of the Batch number and resolution needed for this execution.

Currently, AscendCL supports two modes of model execution: synchronous and asynchronous. Here, synchronous and asynchronous are considered from the perspective of the caller and the executor.

If the interface for executing the model is called and needs to wait for inference to complete before returning, it indicates that the model execution is synchronous. After the user calls the synchronous model execution interface, the model execution result data can be directly obtained from the output parameters of that interface. If a large amount of input data needs to be inferred, synchronous model execution requires waiting for all data to be processed before obtaining the inference result data.
If the interface for executing the model is called and does not wait for inference to complete before returning, it indicates that the model execution is asynchronous. When the user calls the asynchronous model execution interface, they need to specify the Stream (Stream is used to maintain the execution order of some asynchronous operations, ensuring that they are executed on the Device in the order of the code calls in the application). In addition, the aclrtSynchronizeStream interface needs to be called to block program execution until all tasks in the specified Stream are completed before obtaining the inference result data. If a large amount of input data needs to be inferred, during asynchronous model execution, AscendCL provides a Callback mechanism to trigger a callback function, allowing inference result data to be obtained as soon as it is available within a specified time, achieving the purpose of batch obtaining inference result data and improving efficiency.

After inference is completed, if you need to obtain and further process the inference result data, it is up to the user to code and implement it. Finally, don’t forget that we also need to destroy aclmdlDataset, aclDataBuffer, and other data types to release related memory and prevent memory leaks.

Model Unloading

After the model inference is completed, the model needs to be unloaded through the aclmdlUnload interface, and the aclmdlDesc type model description information must be destroyed, and the working memory and weight memory for model execution must be released.

Learn More

Scan to obtainopen source sample code for the development board

Developing AI Inference Applications Based on Orange Pi AIpro

Visit the Ascend Community Documentation Center formore learning resources

Related posts

Leave a Comment Cancel reply