Common Solutions for Slow Deepstream Performance on NVIDIA Jetson Platform

NVIDIA released the latest Deepstream 4.0.

Common Solutions for Slow Deepstream Performance on NVIDIA Jetson Platform

We have previously compiled notes on some new features and applications of Deepstream 4.0:

NVIDIA Deepstream 4.0 Notes (1): Accelerating Real-Time AI Video and Image Analysis

NVIDIA Deepstream 4.0 Notes (2): Smart Retail Scene Applications

NVIDIA Deepstream 4.0 Notes (3): Smart Traffic Scene Applications

NVIDIA Deepstream 4.0 Notes (4): Industrial Inspection Scene Applications

NVIDIA Deepstream 4.0 Notes (5): Warehouse Logistics Scene Applications

NVIDIA Deepstream 4.0 Notes (Final): How to Start Using Deepstream and Containers

Actions speak louder than words. Many users have found that running Deepstream on the Jetson embedded platform results in slower speeds. Today, we summarize several common solutions:

1

Ensure the Jetson clock settings are high.Run these commands to increase the Jetson clock speed.

$ sudo nvpmodel -m <mode> –for MAX perf and power mode is 0$ sudo jetson_clocks

2

A plugin in the pipeline may be running slowly. You can measure the latency of each plugin in the pipeline to identify which one is slow.

  • Enable frame latency measurement

$ export NVDS_ENABLE_LATENCY_MEASUREMENT=1

  • Enable processing latency measurement for all plugins

$ export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1

3

In the configuration file under the [streammux] group, set batched-push-timeout to 1/max_fps. For example, if max_fps is 60fps, then the countdown is 16.7ms.

4

In the configuration file under the [streammux] group, set it to the actual height and width of the video stream,

(this may reduce one scaling process and even lower power consumption)

5

For RTSP stream input, in the [streammux] group of the configuration file, set live-source=1. Also, ensure that the sync attribute of all [sink#] groups is set to 0.

6

If secondary inference is enabled, try increasing the batch size in the [secondary-gie#] group of the configuration file, in case the number of objects to infer exceeds the batch size setting.

(Secondary inference means: the first detection is based on the full image, and the second inference is only for the areas identified in the first detection, for example, the area of the car identified in the first step, then the second step is to infer the category/color of the identified car)

7

On Jetson, use Gst-nvoverlaysink instead of Gst-nveglglessink, because the overlay does not need GPU rendering (the output unit can be merged), while GL requires GPU (to run shaders and so on).

8

If the GPU is the performance bottleneck, we can increase the inference time interval of the primary detector by modifying the interval attribute in the [primary-gie] group of the application configuration or the interval attribute in the Gst-nvinfer configuration file. (In other words, if the GPU is saturated, we can let the main detection network, originally inferring every 10ms, now change to every 30ms…)

9

If elements in the pipeline (likely various processing plugins) are stuck waiting for available buffers—this can be determined by observing whether CPU/GPU usage is low—you can increase the number of buffers allocated by the decoder. Try increasing the number of buffers allocated by the decoder by setting the num-extra-surfaces attribute in the [source#] group, as well as in the application or the Gst nvv4l2decoder element’s num-extra-surfaces attribute.

10

If you are running in Docker or on the console and the FPS performance is low, set qos = 0 in the [sink0] group of the configuration file. The issue is caused by initial loading. I/O operations bog down the CPU, and qos=1 is the default property for the [sink0] group, causing decodebin to start dropping frames. To avoid this, set qos=0 in the [sink0] group of the configuration file.

11

On the NVIDIA® Jetson Nano™, after starting the deepstream-segmentation-test, the system crashes a few minutes later.The system restarts, and the solution is:NVIDIA recommends powering the Jetson module via the DC power connector while running this application.

Common Solutions for Slow Deepstream Performance on NVIDIA Jetson Platform

Leave a Comment