1
Introduction
2
Edge Computing
Edge computing concentrates storage and computing resources at the network edge, close to people, objects, mobile devices, sensors, and other data sources, providing services nearby through a platform that integrates networking, computing, storage, and applications.
In 2016, Shi Weisong and others realized that in the era of the Internet of Everything, the efficiency of cloud computing was insufficient to support the massive data generated by the increasing number of internet devices deployed at the edge. They believed that edge computing, as a new computing paradigm, could solve problems such as computing power limitations, response time requirements, bandwidth cost savings, data security and privacy, and battery life constraints by processing data at the network edge. Shi Weisong and others defined “edge” as any computing and network resource node along the path between the data source and the cloud, discussing the possibilities of edge computing in scenarios such as smart cities, video analysis, and smart homes. In the same year, Sun Xiang and others proposed a new mobile edge computing method called EdgeIoT to overcome the scalability issues of traditional IoT, which processes the raw data streams generated by massive IoT devices. EdgeIoT can significantly reduce the traffic load on the core network and the end-to-end latency between IoT devices and computing resources, facilitating the opening of IoT services.
In 2017, Satyanarayanan and others discussed the importance of edge computing and its proximity, stating that edge computing enables fast and accurate responses, enhances application scalability, achieves finer control over data for privacy protection, and reduces the impact of network or cloud failures on services. In 2019, Wang Rixin and others proposed a video surveillance system based on permissioned blockchain and edge computing, which utilizes edge computing for information collection and data processing from large-scale wireless sensors, employing distributed storage services for massive video data storage, and combining convolutional neural networks for video analysis and real-time monitoring, representing a successful application of edge computing in video surveillance. In 2020, Qi Hui and others invented a warehouse video surveillance system based on edge computing, which offloads video tasks to edge computing nodes, preprocessing the massive real-time data generated by surveillance cameras within a short time while ensuring data reliability, effectively reducing processing and transmission delays.
Many technology companies at home and abroad have enhanced the performance of embedded devices, making edge computing a reality.
(1) NVIDIA’s Jetson platform and GPUs can efficiently utilize GPUs for various deep learning computations. In November 2021, NVIDIA released a compact yet powerful Jetson AGX Orin, suitable for robotics, autonomous machines, medical devices, and embedded edge computing scenarios.
(2) Huawei launched the Atlas 500 smart station for edge applications, using the Ascend 310 processor. The Ascend 310 is an efficient, flexible, and programmable AI processor that employs a Da Vinci architecture AI core for deep learning acceleration. Huawei also provides the MindSpore AI framework for deep learning to simplify the development process.
(3) Bitmain’s SE5 AI computing box is a high-performance, low-power edge computing product based on chips and modules, featuring Bitmain’s self-developed third-generation TPU chip, with INT8 computing power reaching 17.6 TOPS, capable of processing 16 channels of HD video simultaneously, providing computing power for intelligent operations in various industry projects.
(4) Google’s deep learning accelerator Edge TPU can perform real-time deep learning inference for HD video with high energy efficiency.
(5) Cambricon Technologies’ NPU (Neural Processing Unit) has been successfully commercialized in Huawei’s smartphones equipped with Kirin 970 SOC, aiding real-time image intelligent optimization features in Huawei phones.
Video structuring is a video intelligent analysis technology that analyzes content and extracts information from video data. The structured data is stored in a structured database for further analysis and sharing. Target detection algorithms are one of the key technologies for video structuring, commonly used to extract people, objects, and attributes from video images.
1. Overview of Target Detection Algorithms
The target detection task requires determining the types of objects contained in the input data and locating their positions. Target detection technology is not only the cornerstone of intelligent video surveillance but is also widely applied in other fields, such as image retrieval and medical image analysis. Target detection algorithms based on deep convolutional neural networks are generally divided into Two-Stage and One-Stage target detection algorithms.
Two-Stage target detection algorithms divide the task into two steps: first extracting sub-regions where targets may exist in the image, and then using a convolutional neural network for feature extraction on all sub-regions as input, followed by detection classification and bounding box regression correction. A representative algorithm is Fast R-CNN, which takes the entire image as input to extract features while introducing the concept of Region of Interest (ROI), improving detection efficiency compared to previous algorithms. One-Stage target detection algorithms directly treat bounding box predictions as regression predictions, eliminating the need for prior candidate region extraction. The original image is used as input, and the predicted results are output directly, forming an end-to-end single network framework, with YOLO being a representative algorithm. Its significant breakthrough lies in achieving a substantial increase in detection speed at the cost of only a slight decrease in accuracy.
2. Common Lightweight Networks at the Edge
In 2016, the lightweight network SqueezeNet was proposed, achieving results similar to AlexNet while using 50 times fewer parameters on the ImageNet dataset, paving the way for model compression. In 2017, Google introduced MobileNet, which constructs lightweight deep neural networks using depthwise separable convolutions, significantly reducing parameter counts while maintaining model performance, making it suitable for visual tasks on mobile and embedded platforms. Subsequent improvements led to MobileNetV2 and MobileNetV3 based on this algorithm. ShuffleNet, proposed in 2017, shares a similar idea with MobileNet, significantly reducing computational costs while maintaining accuracy, suitable for devices with very limited computing power. GhostNet, introduced in 2020, customizes the number of convolutional kernels based on original convolutions and uses simple linear transformations to extract more features. This algorithm achieves good compression effects and accuracy, deployable on edge computing devices with limited memory and computing resources.
3
Technical Implementation
The application of edge computing in video structuring includes two steps: model development and model deployment. Model development refers to the design and training of neural network models, where the network model typically refers to the target detection model in video structuring tasks. Model deployment refers to transforming the model and deploying it to edge computing devices.
In video structuring tasks, the categories of people and objects required for structuring are predefined, and there are currently comprehensive large generic datasets available for training target detection models, typically employing supervised learning to learn an optimized model from the given dataset.
Model development generally consists of three stages.
1. Data Preparation: Common datasets for target detection tasks include PASCAL VOC and MS COCO, where the annotation method involves labeling the bounding boxes and category labels of targets in training images. Depending on the application scenario of the target detection task, customized datasets can also be collected and annotated as needed.
2. Network Construction: Build the neural network structure for the task. In the specific code implementation process, relevant modules such as numpy, TensorFlow, and PyTorch usually need to be imported.
3. Parameter Optimization: Design reasonable loss functions, learning rates, optimization methods, and iteration counts to train the network and obtain the best parameters, saving the network as a model file for subsequent model deployment work.
The goal of model deployment is to ensure that the model can run the inference process stably with sufficient accuracy on the target hardware platform, which is a complex system engineering task. The key points of model deployment are elaborated below.
1. Model Conversion and Platform Selection.Currently, there are various edge computing device platforms available for model deployment, and different platforms support different model formats. For example, NVIDIA supports TensorRT models, while Huawei supports OM models, necessitating the conversion of the trained model to the format supported by the target platform for inference. Different tasks and models require corresponding platform-specific operators, leading to strong platform dependency.
2. Model Lightweighting.Deep learning models require significant computing, memory, and energy, which poses challenges for real-time inference or running model inference on edge devices with limited computing resources. Model lightweighting is a key focus to address this issue, determining the speed and accuracy of system inference. Due to the limited computing power of edge computing devices, model compression methods such as pruning and quantization are generally needed to make models lighter. Pruning involves removing components in the model that have minimal impact on the output. Deep learning network models often contain many redundant parameters, and many neuron activation values approach zero. Removing these neurons can maintain the same model expressive capability. Quantization refers to converting floating-point operations to integer operations, which involves converting the weights and activation values of the trained deep neural network from high precision to low precision, compressing the model while sacrificing a small amount of accuracy. However, the lightweighting process may lead to a decrease in inference accuracy, necessitating a balance between inference speed and accuracy.
3. Model Deployment.Currently, model deployment can generally be classified into two forms: pipelined deployment and service-oriented deployment. Pipelined deployment requires constructing a complete pipeline method independently, with many available SDKs such as NVIDIA’s DeepStream, Huawei’s MindX SDK, and Cambricon’s CNStream. Service-oriented deployment treats the model as an inference service, providing inference services to clients, with business logic and other operations managed by the client, such as NVIDIA’s Triton Inference Server and Baidu’s Paddle Serving. It is essential to note that the choice of deployment form and device platform must consider development difficulty, cost, system performance, as well as future system maintenance and portability.
4
Conclusion
Edge computing has become a research hotspot in academia and industry in recent years. Through research on edge computing, this article intuitively elaborates on edge computing and its applications in video structuring, detailing the deployment methods of target detection models on edge computing devices. Currently, real-time video stream analysis and video structuring in fields such as intelligent security and smart cities are typical application scenarios of the edge computing paradigm. The successful deployment of edge computing in these fields is a key driving force for its further development and an important support for its application in a wider range of scenarios.
■ Written by Jiang Miao, Zhao Shixian, Ren Junxing, Li Min
Institute of Information Engineering, Chinese Academy of Sciences
Recommended Reading
Interpretation and Application Practice of SVAC Standard
Interview with the Chief Operating Officer of Softcom Wisdom, Li Jin
《China Security》
–Domestic Industry Authority Magazine–
Published by: Editorial Department of “China Security”
Supervised by: China Security Product Industry Association
Publication Date: July 2022
Phone: 010-888825523