Fun with PowerVR and BeagleBone Black Development

Fun with PowerVR and BeagleBone Black Development

The development board is cool, and the BeagleBone® Black (BBB) is one of the most interesting development boards. This widely used microcontroller board costs around £35 and boots Linux in just 10 seconds, allowing any interested developer to get started quickly.

To this end, Imagination has been collaborating with the BeagleBoard.org® Foundation to enable users to benefit from the BBB. The main processor on the BeagleBone® Black (BBB) board is the Arm® Cortex® A8 with the PowerVR SGX™ 530 Graphics Processing Unit (GPU). From an educational perspective, developers can better develop applications using standard OpenGL® ES 2.0 and Open CL TM 1.1 ABI.

Fun with PowerVR and BeagleBone Black Development
The newly released OpenCL library and documentation allow the BBB to explore OpenCL. Although it is not a high-performance solution, the benefit is that students can learn how to program on the BBB platform at a relatively low cost. Some applications have relaxed latency requirements and extensive signal processing, which will benefit from offloading the A8 core. For example, the documentation package includes the ALSA sample rate converter, and if the latency is relaxed, the A8 load will decrease from 55% to 20%. By providing this library, we hope the BeagleBone community can find other useful ways to leverage OpenCL.

Running OpenCL on PowerVR SGX530—Is It Real?

The PowerVR SGX530 is one of Imagination’s most successful GPU designs and is still widely used today. However, the SGX530 is a design from 10 years ago and was developed before Open CL became a standard. Nevertheless, Imagination has provided support for OpenCL 1.1 on the PowerVR SGX530, and given the widespread use of the BeagleBone Black, we have restarted this work and present it here.

The SGX530 OpenCL binaries in this package are still a work in progress, and it should be noted that these binaries do not fully comply with OpenCL 1.1 consistency. We should not assume the OpenCL driver is “as-is”, and limitations are described in the examples.

Using OpenGL ES2.0

There is a specific BBB image AM3358 Debian 9.12 2020-04-06 4GB SD ImgTec that comes with the TI SGX graphics driver and PowerVR SDK pre-installed. This serves as a foundation for exploring OpenGL and OpenCL.

Download link for BBB image:https://debian.beagleboard.org/images/bone-debian-9.12-imgtec-armhf-2020-04-06-4gb.img.xz

The PowerVR SDK provides several examples, starting from drawing a triangle as the most basic graphic element (Figure 1) to generating more complex images using vertex and fragment shaders (Figure 2). This can be used for self-study. Alternatively, the BBB can also serve as a development platform for a mobile graphics introductory course.

Fun with PowerVR and BeagleBone Black Development
Figure 1: Basic Triangle
Fun with PowerVR and BeagleBone Black Development
Figure 2: Adding Vertex and Fragment Shaders

Understanding OpenCL 1.1

The OpenCL package can be downloaded from the Imagination Technology University project website (https://university.imgtec.com/). This package includes a build script that will install the OpenCL library and patch the PowerVR SDK, allowing only the OpenCL 1.1 matrix example to be built for the BBB.

The OpenCL matrix multiplication example is for educational purposes, to understand how OpenCL runs on embedded platforms like the BBB. The key point to understand is the overhead implied in passing data buffers between the Arm A8 and the PowerVR SGX530 and managing data cache consistency between them. During this overhead time, no useful processing work is done, so it can be considered “wasted” time. Compared to the overhead, the processing time shown in Figure 3 is very large, for example, when the A8 is freed to perform other tasks while the SGX is processing, OpenCL on the BBB becomes very useful.

Fun with PowerVR and BeagleBone Black Development

Figure 3: OpenCL Overhead and Processing Time

Using multiple OpenCL cores illustrates how to improve SGX processing performance using single instruction multiple data (SIMD) instructions on float4 types (four packed 32-bit data values).

It must be noted that if the goal is the fastest processing time on the BBB, the best solution is to use Arm® NEONTM internal instructions, as there is no overhead involved.

Using OpenCL 1.1

The matrix multiplication example helps in understanding OpenCL. The next step is to use OpenCL in a real example that has the right combination of high processing and relaxed latency. The selected example is the ALSA sample rate converter (SRC) used in an audio player application. The high processing requirement comes from upsampling from 44.1kHz to 48kHz, and in fact, it is a player, which means latency requirements can be relaxed. The OpenCL implementation fits the ALSA software, as shown in Figure 4, modifying the libspeexdsp layer to call OpenCL.

Fun with PowerVR and BeagleBone Black Development

Figure 4: ALSA Software Architecture with libspeexdsp Modified to Call OpenCL

This example is built by running a script that will download and patch the ALSA package. The OpenCL library is provided as a C++ library, so ALSA needs to be compiled with g++ to ensure that the OpenCL constructor/destructor is called correctly. The g++ compiler is stricter than GCC, so most patches are to correct type conversions on pointers, ensuring that initialized structures are fully initialized and in the correct order.

To leverage OpenCL, the algorithm used must be fully parallel, with the same operation performed on each output sample. The SRC algorithm in libspeexdsp does not actually do this, as the index values calculated in the inner loop will be used for the next iteration of the outer loop. This means that the ALSA-SRC algorithm needs to be restructured so that the index calculation step can be extracted from the processing loop and prioritized, allowing the processing loop to be the same in each iteration.

Since aplay is just a playback application, latency requirements can be relaxed. It turns out this is necessary, as the overhead of processing the standard input audio buffer size of 160 samples prevents real-time execution of the OpenCL calls. A buffer size of 640 or more samples achieves reliable operation. With a 1600 sample buffer, the A8 load when running SRC on A8 drops from 55% to 20%. This CPU offloading is the main advantage of using OpenCL on the BBB, and we hope the community can find other use cases that can improve performance based on BBB systems.

The documentation in the OpenCL download package is more detailed to help readers understand how OpenCL operates and how to use it effectively.

English link: https://www.imaginationtech.com/blog/fun-with-powervr-and-the-beaglebone-black-low-cost-development-made-easy/
Statement: This article is an original article. Reproduction requires indicating the author, source, and original link.
Welcome to join the Imagination GPU and AI group

Fun with PowerVR and BeagleBone Black Development

The group is full of 200 people. To join, please add the editor to pull you into the group,
Editor WeChat ID: eetrend89 (please add with IMG as a note)
Recommended Reading

Imagination and Peking University Jointly Launch “Introduction to Mobile Graphics” Course on XueTang Online

Fun with PowerVR and BeagleBone Black Development

END

Imagination Technologies is a UK-based company dedicated to developing chip and software intellectual property (IP), products based on Imagination IP have been used in the phones, cars, homes, and workplaces of billions of people worldwide. For more information on cutting-edge technologies such as IoT, smart wearables, communications, automotive electronics, graphics, and image development, please follow Imagination Tech!
Long press to identify the QR code
Follow us
Fun with PowerVR and BeagleBone Black Development

Leave a Comment