Achieving High-Performance Sensor Fusion with ARC VPX DSP IP on a Budget for Embedded Solutions

Key Trends Driving the Demand for Sensor FusionSensor fusion refers to the combination of data from multiple sensors to obtain more complete and accurate results. By utilizing information provided by various sensors, better environmental awareness can be achieved. For instance, humans can understand their surroundings by combining information collected from various “sensors” (eyes, ears, nose, tongue, skin) to decide how to respond to different situations. This is a typical example of sensor fusion.Sensor fusion requires three conditions to be met: miniaturized sensors, complex algorithms to extract relevant information from the data streams generated by the sensors, and a SoC that provides the performance required to execute the algorithms within available power and cost budgets.Each “sensor” in the human body has complementary advantages and provides unique information, and sensors in embedded systems must do the same. For example, in Advanced Driver Assistance Systems (ADAS), radar performs robustly under different lighting and weather conditions, LiDAR provides a wide field of view with good angular resolution, while camera-based vision capabilities can quickly and accurately classify objects (Figure 1).

Figure 1: Various Sensors in an ADAS SystemExtracting meaningful information from sensor signals and combining information from different sensor streams requires algorithms. Depending on the application, the complexity of these algorithms can vary significantly, as can the performance requirements. For example, in consumer applications, a smart home device that is always online will only wake up when a specific voice command is detected, while an ADAS system must continuously monitor its environment.Complex algorithms require a SoC that can provide the performance needed to execute them. Of course, this SoC must meet constraints on available power and area, as these will significantly impact the overall profitability of the business. Thermal management and limited battery capacity are two major driving factors, depending on the application. Ideally, such SoCs should be fully programmable to allow for maximum flexibility. Algorithms may evolve over the product’s lifecycle, sensors may require different calibrations during their lifecycle, and it is highly desirable to use the same SoC across multiple versions of the product when differentiation can be achieved through software.Key Features for Efficient Sensor Fusion ImplementationAs mentioned earlier, sensor fusion consists of two main stages: (1) information extraction, and (2) information combination to derive results. This is illustrated in Figure 2.

Figure 2: Sensor Fusion Processing ChainThe first stage can also be referred to as the front end of sensor fusion. Depending on the sensors and the information of interest, different digital signal processing algorithms are applied. For voice, this may involve calculating Mel-frequency cepstral coefficients (MFCC), applying Fourier transforms, and various other DSP operations to extract spectral features from the voice signal. The data will be in integer format, likely represented in 16 bits.For cameras, it involves image signal processing with functions such as image scaling, color space conversion, filtering, or feature detection. Here, the data is represented as pixels, with data formats of 8 bits, up to 16 bits.Finally, for radar, this front-end processing includes range and velocity FFTs, as well as constant false alarm rate (CFAR) for thresholds. Due to dynamic range and precision requirements, the data type is either half-precision or full-precision floating point.The second stage is information combination (back-end processing). The algorithms used depend on the application. Tasks include object detection, recognition, tracking, and prediction. AI-based machine learning algorithms and linear algebra operations can be applied. Naturally, the data types will depend on the algorithms.Due to these specific but differing requirements, sensor fusion necessitates a digital signal processor (DSP) that meets the following key requirements.VersatilityThe algorithms and data types largely depend on the application. Therefore, the DSP architecture must support a rich instruction set to efficiently implement different algorithms, with particular attention to performance-critical operations such as FFT or linear algebra. The DSP must support different precision integer and floating-point data types.This DSP needs to be a qualified flexible computing resource, meaning it must be capable of performing the “classic” filtering operations typically associated with DSPs, as well as machine learning and computer vision algorithms.ScalabilityTo avoid one-time investments, scalability is key. While the requirements for different sensors vary, it is highly desirable to use the same baseline architecture for all signal processing requirements across different designs to reduce system integration work and maximize overall software development efficiency. Scalability allows designers to choose configurations that provide the best PPA for the target application.Scalability is not just about hardware. Kernel optimization for specific architectures is a significant investment on the software side. Importantly, such software can be reused across these SoCs, enabling the reuse of different versions of the SoC (e.g., low-end/mid-range/high-end versions).PPA OptimizationThere are many aspects to optimize in terms of performance/power/area (PPA). First, performance relates to the core’s loop efficiency (i.e., the number of cycles required to perform a specific function), as well as the available processing engines and the ISA that can leverage these engines. This directly relates to effective support for data movement, parallel data processing, and then connecting to a rich (preferably configurable) set of interfaces. For example, connecting accelerators and peripherals directly to the core without going through system memory.The maximum clock frequency of the DSP reflects another aspect of performance. It determines how much power the DSP can provide (in cycles per second), but also affects the amount of work required for timing closure in the physical SoC design.Low power is directly related to performance efficiency and the option to wake certain cores only when needed (as described in smart home applications: waiting for wake-up information).Finally, small area has a direct impact on cost and leakage.Efficient Software DevelopmentSoftware development must be efficient, as a significant investment (and associated personnel) is spent on software development and testing for almost all projects. This requires a high-level programming model with an optimized compiler and a rich set of libraries containing ready-to-use optimized kernels for filtering, transformations (e.g., FFT), vector mathematics, linear algebra, and machine learning. Of course, low-level modules such as drivers, DMA handlers, interrupt handlers, etc., are also needed.DesignWare® ARC® VPX DSP IPThe VPX DSP IP is a series of VLIW/SIMD processors suitable for a wide range of signal processing applications, from always-on devices to automotive ADAS, vision, machine learning, and high-performance computing. Figure 3 provides an overview.

Figure 3: Block Diagram of DesignWare ARC VPX DSP IPThe VPX series is well-suited for sensor fusion requirements as it offers scalability and versatility to achieve optimal PPA and software development efficiency, thereby enhancing overall productivity.All VPX series products are based on the same VLIW/SIMD architecture. Customers can scale their solutions according to their needs, choosing from different vector lengths ranging from 128 bits to 512 bits.In addition to vector length, customers can choose from single-core, dual-core, or quad-core configurations, with multi-core configurations pre-integrated and ready for cache coherence and shared multi-channel DMA.Besides different vector lengths, each VPX core is highly configurable, allowing customization of the architecture for optimal performance while maintaining minimal area.As previously explained, different data types are required depending on the sensors and algorithms. The VPX supports a wide range of data types, from floating point to small integer types required for AI applications, covering the dynamic range needed for applications such as high-resolution radar.The VPX instruction set architecture (ISA) has been optimized for efficiently executing key signal processing kernels, such as FFT or matrix operations. This avoids the cost overhead associated with dedicated hardware accelerators, thus achieving savings in power and area.The ISA and microarchitecture (i.e., the way different functional units are implemented) are key elements in achieving optimal PPA. However, a software development environment is needed to unlock the hardware’s capabilities. The VPX comes with the MetaWare tool suite, which includes an optimized C/C++ compiler, simulation tools, and a sophisticated debugging environment. To support the growing demand for AI, MetaWare also provides an NN SDK and advanced graph mapping tools (supporting TensorFlow, Caffe, ONNX).

Figure 4: Libraries Provided with MetaWare, Optimized for VPXThe VPX series includes VPXxFS variants (VPX2FS, VPX3FS, and VPX5FS) tailored for functional safety (FuSa) certification. These cores meet the requirements for random fault detection and system functional safety development processes, fully compliant with ASIL D ISO 26262. The VPXxFS DSP integrates hardware safety features such as ECC protection for memory and interfaces, safety monitors, and lockstep mechanisms. A comprehensive set of safety documentation assists automotive designers in achieving ISO 26262 functional safety certification. Additionally, the VPXxFS DSP offers a “mixed” option, allowing users to select a higher ASIL D safety level.ConclusionSensor fusion is a rapidly growing market that has penetrated nearly all application areas. Thanks to the availability of low-cost sensors and advanced algorithms, it can deliver new user experiences across different markets, including smart mobile devices, automotive, healthcare, or industrial control. Sensor fusion leads to different signal processing workloads, as different sensors require different data types to represent data and different DSP algorithms to extract information relevant to the actual fusion process.The fusion process (i.e., combining various sensor information streams to produce meaningful decisions) is largely application-specific. To handle these different workloads, a scalable processor is needed to manage different data formats and performance requirements, along with a general and configurable architecture, including memory and interfaces, to meet PPA requirements.The DesignWare ARC VPX IP seriesis the ideal solution for sensor fusion applications: with vector lengths of 128 bits, 256 bits, or 512 bits, it meets the needs of various signal processing workloads. With a custom instruction set and dedicated mathematical hardware engines, it delivers exceptional loop efficiency through unparalleled PPA. Its variable vector length programming model ensures that software can be reused across all products in the VPX series.Authors: Markus Willems, Senior Product Marketing Manager at Synopsys; Pieter van der Wolf, Chief R&D Engineer at Synopsys

Related posts

Leave a Comment Cancel reply