Article: Computer Vision in Automated Parking Systems: Design, Implementation and Challenges
Authors: Markus Heimberger, Jonathan Horgan, Ciaran Hughes, John McDonald, Senthil Yogamani
Editor: Point Cloud PCL
Source: arXiv 2021
Welcome to join the free knowledge circle to obtain PDF papers, and feel free to share with friends.The article is for academic sharing only. If there is any infringement, please contact us to delete the article. Please do not reprint without the blogger’s consent.
The public account is dedicated to sharing articles and technologies related to point cloud processing, SLAM, 3D vision, and high-precision maps. We welcome everyone to join us for communication and progress together. If interested, please contact WeChat: 920177957. This article is shared by the Point Cloud PCL blogger. Please do not reprint without the author’s permission. We encourage students to actively share and communicate.
Abstract
Automated driving is an active research area in both industry and academia. Automated parking is a key enabling product of fully automated driving systems, which is a form of autonomous driving in limited low-speed parking scenarios. From the perspective of high-end systems built on the previous generation of driver assistance systems (including collision warning, pedestrian detection, etc.), this is an important milestone. In this paper, we discuss the design and implementation of automated parking systems from the perspective of computer vision algorithms. Designing a low-cost system with functional safety is challenging and leads to a huge gap between prototypes and final products to handle all corner cases. We demonstrate that the camera system is crucial for addressing a range of automated parking use cases and how to enhance the robustness of systems based on active ranging sensors (such as ultrasonic and radar). Key visual modules for implementing parking use cases include 3D reconstruction, parking space marking recognition, drivable space, and vehicle/pedestrian detection. We detail important parking use cases and demonstrate how to combine visual modules to form a robust parking system. To the authors’ knowledge, this is the first systematic and detailed discussion of the commercial automated parking field.
Introduction
The first generation of parking systems is semi-automated parking systems that use ultrasonic or radar, which have been improved by adding camera sensors to provide more robust and general solutions. In this paper, we consider the camera as an important component of the parking system, extending the functionality of other sensors or providing a low-cost alternative. Figure 1 shows various views of common ADAS applications, some of which are required for parking systems.
Figure 1: Camera-based ADAS applications and their respective field of view
Typically, the surround-view camera system consists of four sensors forming a network with small overlapping areas sufficient to cover the near-field region around the car. Figure 2 shows four views of a typical camera network like this. It is important to note that the design and positioning of cameras are aimed at maximizing near-field perception performance (which is crucial for automated parking). As part of near-field perception design, wide-angle lenses are used to cover a large field of view. Therefore, algorithm design must contend with the distortion of fisheye cameras, which is not a trivial challenge, as most academic literature on computer vision focuses on isometric camera models or at most only slightly radially distorted cameras.
Figure 2: Example images from a panoramic camera showing near-field sensing and wide field of view.
Designing parking systems comes with many challenges, as high accuracy is required due to functional safety considerations, accident risks, and consumer comfort (e.g., the car cannot park, and the driver cannot open the door). The infrastructure is relatively unknown, and there may be dynamic interacting objects such as vehicles, pedestrians, and animals, and different environmental conditions can also play a significant role. For example, low light conditions and adverse weather (such as rain and fog) can significantly suppress accuracy and detection range. There is also a commercial aspect that can limit the computational power available on low-power embedded systems. On the other hand, compared to fully automated driving, parking scenarios are more limited in possibilities. The vehicle speed is low, providing sufficient processing time for decision-making. The camera motion is confined to clearly defined areas of interest. Infrastructure may help mitigate this issue, especially in finding and navigating to empty parking spaces, although in this work, we do not discuss any infrastructure support, but the authors believe that this will be an important component of automated parking solutions. The term automated parking can refer to an intelligent infrastructure that manages the placement of cars in mechanical parking lots, often multi-story or embedded smart electronic systems in cars. A simple literature search indicates that most results correspond to this meaning rather than the one we are using. Papers [4] and [5] are closest to fully vision-based automated parking systems. These papers focus solely on computer vision algorithms. In contrast, in this paper, our goal is to provide a more comprehensive review of the use of computer vision in parking lots from the perspective of detailed use case descriptions and the fundamental computer vision modules required for expansion.
The structure of this article is illustrated in Figure 3, which provides a high-level overview of the decision-making process when designing automated parking systems. In fact, through adaptation, most ADAS functionalities and some design decisions that need to be considered at each stage. The biggest limiting factor in design is hardware selection, as automotive systems have stricter constraints compared to commercial electronic systems (such as cost, safety factors, standard compliance, thermal issues, etc.).
Figure 3: Design decision process for camera-based parking systems
Hardware Components
In this section, we outline the hardware components that constitute the parking system, emphasizing the role of safety and the computational constraints from a commercial perspective.
2.1 ECU Systems and Electronic Device Interfaces
At a high level, there are two types of camera systems: standalone cameras with small embedded systems tightly integrated into the camera housing. This is sufficient for smaller applications like rearview cameras. However, for more complex applications, cameras are often connected to powerful external SOCs via additional interface electronics. As shown in Figure 2, for a typical surround-view system with four camera inputs, spatially separated cameras must connect to a central ECU. Compared to other systems, the video data bandwidth requirements are high, which poses many challenges and limitations for the SOC. The raw digital output from the sensors is typically 10/12 bits, but the video input ports of the SOC may only support 8 bits, requiring the external ISP to compress depth to 8 bits. Other simple factors such as resolution and frame rate may double the system requirements, and the connection between the SOC and the camera is typically done via twisted pairs or coaxial cables.
Figure 5: Visual system framework diagram
Figure 5 illustrates two alternative methods used. The use of serializers and deserializers (collectively referred to as SerDes) is more common through coaxial cables due to its high bandwidth of 1 Gbps/channel. The coaxial cable interface uses the Fakra connector commonly used by European OEMs. Ethernet interfaces and twisted pair cables are a cheaper option but have relatively limited bandwidth of 100 Mbps. To compensate for this limitation, motion JPEG is performed before transmission, which leads to the limitation of having a complete ISP and conversion chip for MJPEG. Another option could utilize the SOC ISP. The Ethernet camera also requires more complex electronic circuits on both ends. Gigabit Ethernet can achieve higher bandwidth, but it is more expensive and does not meet the cost reduction goal.
Most modern SOC interfaces are digital and serial. MIPI (Mobile Industry Processor Interface) standardizes the camera input CSI (Camera Serial Interface) and DSI (Display Serial Interface) serial interfaces. These interfaces are implemented below as LVDS (Low-Voltage Differential Signaling) connectors. CSI2 is the current generation with a bandwidth of 1 Gbps/channel. OLDI (Open LVDS Display Interface) works on bare metal LVDS with an open LVDS interface. In addition to serial interfaces, some SOCs also offer parallel interfaces. Although parallel interfaces provide higher bandwidth, they require more extensive wiring and more complex circuitry, which is not scalable.
2.2 Cameras
The camera component typically consists of an imaging sensor, optical system, and optional ISP HW optical system: the optical system consists of the lens, aperture, and shutter. These components are captured in the camera matrix by focal length (f), aperture (d), field of view (FOV), and optical transfer function (OTF). Dynamic range: The dynamic range of the image sensor describes the ratio between the lower and upper limits of the brightness range that the sensor can capture. Portions of scenes captured below the lower limit will be clipped to black or below the sensor’s noise floor, while those above the upper limit will be saturated to white by the image sensor. There is no specific threshold for the dynamic range for the sensor to become high dynamic range (HDR); rather, this term is typically applied to types of image sensors that achieve higher dynamic range than conventional sensors using specific mechanisms. Sensitivity: The sensitivity of a pixel measures the pixel’s response to illumination over time. Many factors can affect pixel sensitivity, such as silicon purity, pixel architecture design, microlens design, etc. However, one of the biggest factors is the physical size of the pixel. Larger pixels will be able to collect more photons, resulting in a greater response to lower illumination. However, increasing pixel size to improve sensitivity will have the effect of reducing spatial resolution [6]. Signal-to-noise ratio: For engineers from a signal processing background, the signal-to-noise ratio may be the most intuitive characteristic. It is the ratio of signal strength (or level) to the noise source in the imager. The main issue is that the methods used by image sensor manufacturers to measure noise are non-standard, making it difficult to compare different types of image sensors based on SNR. Furthermore, the reported SNR will be based on fixed scenes, while the actual SNR of the received images will depend on the scene and be influenced by the pixel exposure time and gain factor applied to the signal, among other factors. ISP: Converting the raw signals from the sensor into a viewable format includes various steps such as de-layering, denoising, and high dynamic range processing. These steps are collectively referred to as image signal processing (ISP). Most ISPs are typically performed in hardware, either within the sensor itself as a companion chip ISP or in the main SOC (System on Chip). Fundamentally, the ISP is the set of steps required by the application to convert the captured image into a usable format.
2.3 SOC
The typical design constraints for selecting embedded system SOCs are performance (MIPS, utilization, bandwidth), cost, power consumption, heat dissipation, high-low end scalability, and programmability. Unlike handheld devices, power consumption is not a primary criterion since it is powered by the vehicle’s battery. The importance of heat dissipation only reaches a threshold and may increase costs with better heat sinks. Programmability becomes abstracted through software frameworks like OpenCL and is not a primary cost factor. Therefore, for ADAS, the main factors ultimately boil down to cost and performance. Given the diversity of processors, this is often a difficult decision to make. Typically, comparing processors through MIPS is not useful, as utilization largely depends on the nature of the algorithms. Therefore, benchmark analysis based on vendor libraries and estimated lists of given applications is crucial for selecting the right SOC. A mixed architecture combining fully programmable, semi-programmable, and hard-coded processors may be a good amortization risk option. Examples of commercial automotive-grade SOCs include Texas Instruments TDA2x, Nvidia Tegra X1, Renesas R-car H3, etc.
Automated Parking Use Cases
3.1 Overview of Automated Parking
Automated parking systems have been on the mass market for a while, starting with parallel parking and then evolving in recent years to include vertical parking. Parking systems have surpassed merely controlling steering in driver assistance systems, achieving partial automation of both lateral and longitudinal control. The challenge of parking assistance systems is to reliably and accurately detect parking spaces to allow parking maneuvers with minimal independent movement. The goal of automated parking systems is to provide drivers with robust, safe, comfortable, and most importantly, useful functions that save time, ensure accurate and collision-free parking. Currently, systems on the market rely solely on distance sensor data, usually ultrasonic, for empty space detection, re-measurement, and collision avoidance during automated parking. While such systems have proven to be very successful in the field and are being released in the low-end market, they have some inherent limitations that cannot be resolved without the help of other sensor technologies. The use cases described below focus on the benefits of camera-based solutions fused with ultrasonic sensors, particularly trying to address some limitations of current systems and push automated parking technology to the next level.
Before automated parking can begin, the parking system must first search for, identify, and accurately locate valid parking spaces around the vehicle. Current systems typically rely on the driver to initiate search mode, and parking spaces can take different forms as described in the following use cases. Once parking spaces are located, they are presented to the driver to allow for the selection of the desired parking space and the orientation of the car facing the final parking position. After the driver selects a parking space, the vehicle automatically traverses the computed trajectory to reach the desired endpoint position, usually at a limited speed below 10 km/h. To keep this functionality at automation level 2 and avoid the legal implications of conditional automation, in which the system monitors the driving environment and requires the driver to show their attention through an unattended switch in the vehicle.
Partial automation systems may also allow the driver to exit the vehicle after identifying a parking space and remotely start the parking operation via a remote key or smartphone. In this case, the driver must always be responsible for monitoring the environment around the vehicle and controlling the parking operation via switches on the remote key or smartphone. Remote parking is suitable for scenarios where parking spaces have been located and measured or in controlled environments (like garage parking), where the vehicle can safely explore the forward environment within limited distances and steering angles. During parking maneuvers, the system continues to re-measure the expected parking space and the position of the vehicle itself. Due to inaccuracies in slot measurements and self-odometry measurement errors, continuous re-measurement of positioning during the parking process is necessary to improve the accuracy of the endpoint position and avoid collisions with static or dynamic obstacles (such as pedestrians). The way the parking trajectory is calculated should ensure that the selected trajectory is the most suitable for the parking situation, i.e., the trajectory selection is completed in the middle of the parking space from the current position without any collisions and limited maneuvering/direction changes (i.e., driving in reverse, and vice versa). The enhancement of automated parking system functionalities is not only to park the car in the parking space but also to retrieve the car from the parking space.
3.2 Benefits of Camera-Based Parking
Current systems rely on distance sensor information, typically ultrasonic sensors, to identify and locate parking spaces. However, there are many inherent problems with distance sensors for automated parking that can be partially or fully resolved by using camera data. In this case, the best camera data comes from four surround-view fisheye cameras located at the front and rear and in both side mirrors to assist with parking space search, parking automation, and visualization of all parking use cases.
After distance sensors have already located parking spaces, a single rearview fisheye camera is also beneficial in a limited number of reverse parking use cases. The narrower field of view front camera is of little benefit for parking space search but can help automate forward parking scenarios, similar to the rearview camera.
The biggest limitation of distance sensors used for parking space detection is that they require the presence of other obstacles in the scene to identify the boundaries of parking spaces. Cameras can be used to detect parking spaces using road markings while leveraging the type of line endings to understand the intended use of the parking space. Using LiDAR technology can detect parking space markings; however, the cost of the sensors and limited field of view are the biggest drawbacks.
Figure 6: Accurate parking scene based on parking space rather than other vehicles.
Figure 6 illustrates that using a camera fusion system allows for more accurate vehicle parking. A parking system based solely on ultrasonic/radar will attempt to align with other parked vehicles, while a camera/fusion allows parking based on the position relative to the parking space. While the range of the camera detection (∼10 meters) is smaller than that of radar or LiDAR (∼100 meters), the camera does provide a greater range (∼4m) than ultrasonic sensors while also having overlapping fields of view. Ultrasonic can provide precise distance data, while cameras are better suited for providing high angular resolution. These characteristics make the sensing capabilities of cameras and ultrasonics complementary. The large vertical field of view (∼140◦) of fisheye cameras can target obstacles above vehicle height within near distances (<1m), which is beneficial for automated parking situations, such as entering a garage with a roll-up door where the door is not fully opened to allow the vehicle to enter. Most distance sensors have very limited vertical fields of view and thus cannot cover this scenario.
Since cameras have a significant measurement resolution advantage, they can generate point cloud data for certain types of objects that active sensors may not be able to detect, such as poles or chain-link fences. These “blind spots” of ultrasonic and other sensors can greatly impact the robustness and reliability of automated parking functionalities. Surround-view cameras can generate precise ground topology around the vehicle to help locate curbs, parking lots, and parking locks, as well as understand surface changes of free space. Cameras can also be used to provide reliable vehicle odometry information through popular visual simultaneous localization and mapping (SLAM) techniques in robotics. This visual odometry can overcome many accuracy issues inherent in mechanical odometer sensors and provide the resolution needed to minimize parking adjustment times after the initial parking space selection.
3.3 Classification of Parking Scenarios
Automated parking has various applications but can conceptually be divided into four main parking use cases:
1. Vertical parking (forward and backward): As the vehicle passes, the system detects the positions of objects and line markings in the near field and measures the size and orientation of the parking space to understand whether it can offer a parking space to the user, thereby detecting the lateral parking space of the vehicle. If the user chooses to park, the system will find a safe driving trajectory to reach the target parking position while orienting the vehicle according to the boundaries or markings of the parking space created relative to other objects. Figure 8(b) describes an example of backward parking maneuver completed through three steps, while (c) describes a forward parking maneuver. Computer vision methods support obstacle detection through classification and SFM techniques. This data enhances the detection rates and ranges of systems fused with traditional ultrasonic-based systems (Figure 7(a)), thus increasing the true positive and false positive rates of parking spaces provided by users while also improving parking space orientation and measurement, thereby reducing parking adjustments. Computer vision can also park based on parking space markings, providing more accurate parking results, which is not feasible in traditional ultrasonic-based parking systems.
Figure 7: Highlights the benefits of fusing computer vision with traditional ultrasonic-based parking systems. (a) Improved detection performance and range, (b) detecting non-parking environments with ultrasonic (only lane markings).
2. Parallel parking: Parallel parking (Figure 8(a)) is a well-defined parking situation, similar to vertical parking. However, the strategies and situations differ significantly. Typically, entering a parking space requires only one adjustment, while further adjustments can align more accurately with the parking space. Additionally, parking tolerances are usually low, as parking close to surrounding vehicles and curbs within the parking space is desired. Fusion with camera systems can reduce parking operation tolerances and achieve more reliable curb detection (ultrasonic and radar can detect curbs but are often unreliable).
Figure 8: Classification of parking scenarios – (a) Parallel reverse parking, (b) Vertical reverse parking, (c) Vertical forward parking, (d) Ambiguous parking, and (e) Fishbone parking with road markings.
3. Fishbone parking: Figure 8(e) shows an example of fishbone parking, where current ultrasonic-based parking systems are limited because the detection density is too low to identify the orientation of the parking space. In this case, using camera systems can increase the range of observation within the parking space to determine the orientation of the parking space from objects or line markings, which current ultrasonic-based systems cannot cover.
4. Ambiguous parking: The last broad category of use cases is ambiguous parking situations, where parking spaces are not well-defined unless there are other vehicles and objects (Figure 8(d)). By increasing the detection range and providing more complete sensor coverage around the vehicle (ultrasonic usually does not cover the sides of the vehicle), using cameras allows for better planning of parking maneuvers, thus enabling more appropriate adjustments in some ambiguous use cases. Additionally, using camera systems in parking lots can enable or improve the reliability of other functionalities compared to ultrasonic/radar parking systems, such as:
1. Emergency braking/comfort braking: Certainly, in any level of automated driving conditions, vehicles need to respond to the presence of vulnerable road users. Sometimes, the environment may change rapidly (for example, pedestrians quickly entering the automated parking area), so the vehicle must respond quickly and safely. By complementing existing parking systems, low-speed automatic emergency braking or comfort braking becomes more robust due to the additional redundancy provided by camera fusion.
2. Overlaying object distance information: A very common use of combining visual system data with traditional parking systems is to overlay object distance information in the video output stream, such as in surround view systems. This helps the driver accurately estimate the 360° video output stream when manually adjusting the vehicle for more precise navigation within the parking space. This is especially useful for parallel parking spaces with curbs, as the curb is not visible to the driver.
Visual Applications
Vision-based ADAS applications began mass production in the early 2000s with systems like Lane Departure Warning (LDW). Since then, the field of vision-based ADAS has rapidly evolved. This is due to significant improvements in processing and imaging hardware, as well as the automotive industry’s drive to add more ADAS functions to enhance safety and improve brand visibility in the market. As cameras are rapidly accepted as standard equipment to improve driver visibility (surround view systems), it is logical that these sensors are used in parallel for ADAS applications.
In this section, we will discuss the use of four important ADAS functions and their relevance in automated parking systems. The focus is on algorithms feasible on current ADAS systems, where we need to consider detection, localization, and in some cases, classification of 1) non-moving obstacles, such as parked vehicles, 2) parking lines and other ground markings, 3) pedestrians and general moving obstacles, and 4) free space to support the removal of tracked obstacles from parking maps. The algorithms discussed in the following sections are based on feasibility deployed on embedded systems two years ago.
4.1. 3D Point Cloud
Depth estimation refers to a set of algorithms aimed at obtaining a representation of the spatial structure of the environment within the sensor’s field of view. In the context of automated parking, it is a primary scheme used by computer vision to build maps, which is crucial for all parking use cases: it enables better estimation of the depth of parking spaces on existing ultrasonic parking systems, thus better planning trajectories for vertical and fishbone parking; it improves curb detection reliability, enhancing maneuverability during parallel parking; additionally, it provides detection results for obstacles, significantly reducing false positives in automatic emergency braking. Depth estimation is a major focus of many active sensor systems, such as TOF (Time-of-Flight) cameras, LiDAR, and radar, which remains a complex topic for passive sensors like cameras. There are two main types of depth perception techniques for cameras: stereo cameras and monocular systems. The primary advantage of stereo cameras over monocular systems is improved depth perception capabilities. It works by resolving the correspondence problem for each pixel, allowing pixel positions from the left camera image to be mapped to the right camera image. The map showing these distances between pixels is called a disparity map, which is proportional to the physical distance of the corresponding world points from the camera. Using known camera calibration and baseline, rays forming pixel pairs between the two cameras can be projected and triangulated to solve for the 3D position of each pixel in the world coordinate system. Figure 9 shows an example of sparse 3D reconstruction.
Monocular systems can also perceive depth but require camera motion to create a baseline for scene reconstruction; this scene reconstruction method is called Structure from Motion (SFM), using sparse or dense techniques to track or match pixels in images from one frame to the next. The camera processes known motion between frames and uses camera calibration to project and triangulate the corresponding world positions of points. Bundle adjustment is a commonly used method to refine the estimated 3D positions in the scene and the relative motion of the camera based on optimal criteria, involving projections of all points in the images.
4.2. Parking Space Marking and Recognition
Of course, the detection of parking spaces is critical for any automated parking system; the system must know where it will park before completing the maneuver. To detect parking spaces without obstacles limiting the parking location, detecting road markings that define parking locations is crucial. Consider: how can an automated parking system select a valid parking space in an empty parking lot? This applies to all parking maneuvers (vertical, parallel, and fishbone), where operations need to be performed based on marked parking spaces. Technologies like LiDAR can be used for parking space marking recognition; LiDAR has spectral responses to road markings. However, LiDAR systems are often expensive, have limited detection areas, and do not have the wide field of view that cameras do (>140◦FOV). In vision, lane marking detection can be accomplished using image top-down correction, edge extraction, and Hough space analysis to detect markings and marking pairs.
Figure 10 shows an example result of a similar method
using a 190◦ horizontal field of view parking camera. The same authors also proposed a different method based on manually determined seed point inputs, followed by applying structural analysis techniques to extract parking spaces. Alternatively, a method based on HOG (Histogram of Oriented Gradients) and LBP (Local Binary Patterns) features with a pre-trained model was proposed in paper [18], applying linear SVM (Support Vector Machine) to build classification models. Regardless of the specific approach taken, it is evident that detecting marked parking spaces is critical for a complete automated parking system, and the only reasonable and effective technology to achieve this is cameras with a wide field of view.
4.3. Vehicle and Pedestrian Detection/Tracking
Vehicle detection and tracking are typically performed in the context of front camera detection for automatic emergency braking or traffic monitoring applications. However, the parking process usually occurs in the presence of other vehicles (whether parked or moving), so the detection and tracking of vehicles are very important for automating such maneuvers. More importantly, the system must reliably detect and classify pedestrians so that the vehicle can take appropriate actions, such as automatically emergency braking in the presence of potential risk from pedestrians (Figure 11).
Figure 11: Classification and tracking of pedestrians using cameras
Typically, both vehicle detection and pedestrian detection problems are solved using some type of classification. Compared to visual systems, no other sensors can classify detections based on object type as easily and reliably. Object classification typically falls into the supervised classification group of machine learning algorithms. This is based on human knowledge of selecting sample thumbnails across multiple images representing specific object categories. Using feature extraction methods such as HOG, LBP, and wavelets applied to human classification sample images, predictive models are built using machine learning to classify objects. Many vision-based ADAS features use machine learning methods for classification. As mentioned earlier, classification is widely used for detecting pedestrians and vehicles but also for face detection and traffic sign recognition (TSR). The quality of the final algorithm heavily depends on the amount and quality of sample data used to learn the classifier, as well as the overall quality of the classification techniques and the appropriateness of feature selection methods for the target application. Typical classifiers include SVM, random forests, and convolutional neural networks (CNN). Recently, this trend has changed with deep learning methods that automatically learn features.
4.4. Drivable Space
Drivable space is a feature used by most environmental sensing maps. It is the area around the vehicle that is not occupied by objects within the sensor’s field of view, typically classified as an occupancy grid map problem, usually detected by segmenting the ground from other objects. In occupancy grid map methods, free space information is integrated and stored over time. In the case of vector-based map representations, the presence probability of each object is updated based on free space measurements. Drivable space is used to clear dynamic and static obstacles that have not been actively measured or updated in environmental sensing maps. This means that a good free space model can quickly eliminate dynamic obstacles from previous positions without removing valid static information. Free space should also clear previously detected static objects that have moved since the last valid measurement, as well as detections that exist in the map due to odometry update errors and false positives.
Figure 12: (a) Shows the segmentation result using image-based drivable area, (b) Shows the radial unit drivable space definition based on (a) road segmentation
Figure 12 shows an example of image segmentation based on camera free space; additionally, free space supports collision-free trajectory search and planning, especially in cumulative free space grid maps. Unlike other sensor types, visual systems can provide different, independent estimates of vehicle free space. For example, another way to determine camera-based free space is to utilize 3D point clouds and their corresponding obstacle information. However, it also reconstructs features of the road surface around the vehicle; the reconstructed features related to the ground can provide valuable free space information. If there are reconstructed features related to the ground, a reasonable assumption is that the area between that point and the sensor (camera) is not obscured by objects and can be used to define the free space area around the main vehicle. Since these methods are independent and complementary, fusing these techniques (with themselves as well as free space provided by other sensors such as ultrasonic) may also be beneficial to improve the accuracy and robustness of free space measurements.
4.5. Other Visual Functions
Computer vision technologies can also support several other areas in automated parking scenarios. Visual odometry is a task closely related to depth estimation, through visual SLAM/bundle adjustment techniques, although there are other visual odometry methods. While vehicle odometry can be used on vehicle networks (CAN/FlexRay), relying solely on these signals is inaccurate due to network latency, signal constraints, and inaccuracies (e.g., relying on accelerometers) and limited degrees of freedom (usually limited to speed and heading). In automated parking, the quality of odometry is crucial for user comfort and parking accuracy – with improved odometry, parking can be completed with fewer adjustments, and the final position is closer to the target position. Intersection traffic alert algorithms aim to detect critical traffic situations at intersections that may pose a threat to the main vehicle, such as T-intersections (especially in limited visibility).
Figure 13: Example of traffic conditions at parking space exits, (b) Shows a screenshot of the intersection traffic detection algorithm
The second image in Figure 13 shows an example of an intersection vehicle detection algorithm based on optical flow and main odometry compensation. In addition to the parking space marking detection discussed earlier, it is also necessary to be able to detect other road markings, such as arrows and no-parking signs, which allow automated parking vehicles to comply with the rules stipulated in the parking area.
Automated Parking Systems
Considerations for automated parking systems are illustrated in Figure 14, where many factors influence the specifications of vision-based or partially vision (fusion) automated parking systems. Since most choices impact the entire system, very few parts of the system can be considered in isolation. A simple example is the choice of camera pixel resolution, which can affect the potential use cases implemented through hardware and software; camera resolution affects ISP selection, SerDes selection, memory bandwidth requirements, memory requirements, computational requirements for computer vision, accuracy and range performance of the system, low-light performance, and display requirements. Therefore, it is necessary to understand and define some limitations in the aspects of hardware, use cases, and computer vision algorithms.
Figure 14: Application stack for fully automated parking systems using cameras.
From a hardware perspective, the main variables are the selection of camera imagers and ECU processing SOCs, considering thermal, power, and cost within tolerances. Typical automotive imagers for surround view applications are transitioning from 1MP resolution to 2-4MP resolution. The challenge of increasing resolution is to do so while maintaining or preferably improving low-light sensitivity. This is crucial for the usability and availability of camera-based automated parking systems. Additional pixel resolution improves the accuracy and range of the system, allowing for more parking use cases, more robust parking performance, and higher speed availability. Once image formation occurs, it is critical to process as many computer vision functions in parallel at the highest possible frame rate and resolution. This is where SOC selection becomes crucial. Reducing system load always requires trade-offs, including deploying smart state machines to ensure only critical computer vision algorithms run, downscaling and skipping processed images to reduce load. As pixel-level processing, traditionally performed on hardware vector processing engines or DSPs (which typically account for 60-70% of any computer vision algorithm load), is replaced by specific computer vision hardware accelerators, these trade-offs become less stringent. These computer vision accelerators, used for processing tasks such as dense optical flow, stereo disparity, and convolution, can achieve higher pixel processing throughput at lower power consumption at the cost of flexibility. The use cases that the system needs to cover also play an important role in system specifications. The automated parking use cases to be covered define requirements in terms of detection capabilities, accuracy, coverage, operational range, operational speed, and system availability. This will influence the choice of sensors and SOCs, but most importantly, it defines the performance required from computer vision algorithms to enable functionalities. For instance, line-interleaved automatic vertical parking requires many computer vision algorithms to work in parallel with the required accuracy and robustness to achieve reliable and useful functionalities. First, line marking detection algorithms need to operate at speeds and detection ranges practical for automatic parking space searches. At the same time, an algorithm like motion structure is needed to ensure that no objects (parking locks, cones, trash cans, etc.) are in the slot while measuring the end position of the slot, which could be in the form of a curb. Pedestrian detection is also a good complement to reduce but not eliminate the user’s supervisory burden during parking maneuvers. These computer vision functionalities require support from online calibration algorithms and contamination detection capabilities to operate and understand when they are unavailable, thereby informing the system and the user. Camera information is often fused over time with other distance sensor information (e.g., ultrasonic or radar data) to improve system robustness, accuracy, and availability. However, some detections can only be achieved with cameras, such as classification. As surround-view cameras become a standard sensor, the more functionalities that cameras can achieve, the lower the system cost will be, thus requiring fewer supporting sensors.
Figure 15: Example use case of Park4U family, where the vision-based system uses landmarks to position the car to stored trajectories for autonomous navigation to the family parking space.
Computer vision has recently made significant advances in deep learning, particularly convolutional neural networks (CNN), which have greatly improved the accuracy of object detection, enhancing the perception capabilities of automated driving. It has also achieved dense pixel classification through semantic segmentation, which was previously not feasible. Furthermore, CNN has demonstrated strong trends in achieving recent breakthroughs in geometric visual algorithms, such as optical flow, structure from motion, and relocalization. Advances in CNN have also led hardware manufacturers to incorporate customized hardware IPs to provide high throughput of over 10 Tera operations (TOPS) per second. In addition, next-generation hardware will feature dense optical flow and stereo vision hardware accelerators to achieve general detection of moving and stationary objects. From the perspective of use cases, the next step for parking systems is to make them truly autonomous, allowing drivers to leave the car and autonomously locate and park in unmapped environments. In addition, vehicles should be able to leave the parking space and safely return to the driver. Cameras can play a very important role in future automated parking systems by providing important information about the vehicle’s surrounding environment. This includes information on objects and free space, parking space marking detection, pedestrian detection, and more, to be fused with other sensor technologies.
As described in this article, current automated parking systems control the vehicle after the user identifies and selects a parking space. During the slot search, the system’s state is essentially passive. Future trends and challenges are the automation of the slot search itself to achieve complete vehicle parking automation, including searching, selecting, and parking, all done in a robust, repeatable, and safe manner. These automated parking scenarios can be divided into the following categories:
1) Automated parking in known areas and 2) Automated parking in unknown areas.
Automated parking in known areas generally involves the driver “training” the automated parking system and parking trajectory (see Figure 15). During this training, the sensors locate landmarks in the scene and record the expected trajectory driven by the driver relative to these landmarks. When the automated parking system returns, it can recognize the scene and use the training information to automatically position the vehicle to allow for automated parking along the stored trajectory. Valeo launched its automated parking 4U at the Frankfurt Motor Show for unknown area automated parking. The challenge of achieving a leap to new levels of automation is to extend vision-based automated parking systems for self-localization (SLAM) and allow accurate identification of stored home areas. To achieve the highest level of automation in automated parking systems, a combination of sensor technologies (cameras, ultrasonics, radar, or LiDAR) is clearly needed to achieve maximum accuracy and reliability in self-localization, detection, and environmental prediction.
Conclusion
Automated driving is a rapidly developing technology field, and many high-end vehicles have begun to be equipped with automated parking functionalities. This has led to significant improvements in sensors and computational capabilities, enabling the production of more robust and accurate systems. Despite the challenges of liability, progressive legislation is being introduced by government regulatory agencies such as the European New Car Assessment Programme (EuroNCAP) and the National Highway Traffic Safety Administration (NHTSA) to enforce safety systems and begin legislation allowing automated vehicles to operate on public road networks. Camera sensors will continue to play a vital role due to their low cost and the rich semantics they capture compared to other sensors. In this article, we focused on the benefits of camera sensors and how they enable parking use cases. We discussed the system implementation of automated parking systems, which features four fisheye cameras surrounding the vehicle. We detailed the various aspects of the system, including embedded system components, the parking use cases that need to be addressed, and the visual algorithms required to solve these use cases. As a focus on computer vision, we omitted the details of sensor fusion, trajectory control, and motion planning.
For more detailed content, please join the knowledge circle to view the original text.
Resources
Sharing related to autonomous driving and localization
[Point Cloud Paper Quick Read] Localization methods based on LiDAR odometry and 3D point cloud maps
Motion object detection based on optical flow in autonomous driving
Camera extrinsic parameter calibration based on semantic segmentation
Review: Theoretical models and perception introduction of panoramic fisheye cameras for autonomous driving
Review of localization methods for autonomous driving vehicles in high-speed scenes
Patchwork++: A fast and robust ground segmentation method based on point clouds
PaGO-LOAM: LiDAR odometry based on ground optimization
Multi-modal curb detection and filtering methods
A framework for simultaneous calibration, localization, and mapping of multiple LiDARs
Extraction mapping and long-term localization of pole-like objects in dynamic urban environments
Motion distortion correction of non-repetitive scanning LiDAR
Fast tightly-coupled sparse direct LiDAR-inertial-visual odometry
3D vehicle detection based on camera and low-resolution LiDAR
Labeling tools and urban datasets for 3D point cloud semantic segmentation
Introduction to ROS2
Automatic calibration of solid-state LiDAR and camera systems
Sensor fusion localization scheme based on LiDAR+GPS+IMU+wheel speed sensors
Mapping and localization of road scenes based on sparse semantic visual features
Real-time detection of vehicles on roads and sidewalks based on LiDAR in autonomous driving (code open-source)
Labeling tools and urban datasets for 3D point cloud semantic segmentation
For more articles, please check: Comprehensive collection of historical articles on point cloud learning
Sharing related to SLAM and AR
Introduction to TOF camera principles
Introduction to TOF time-of-flight depth cameras
Structured PLP-SLAM: Efficient sparse mapping and localization schemes using monocular, RGB-D, and stereo cameras
Open-source and optimized F-LOAM scheme: Optimization-based SC-F-LOAM
[Open-source scheme sharing] ORB-SLAM3 is now open-source!
[Paper Quick Read] AVP-SLAM: Semantic SLAM in automated parking systems
[Point Cloud Paper Quick Read] StructSLAM: Structured line feature SLAM
SLAM and AR review
Common 3D depth cameras
Review and evaluation of monocular visual inertial SLAM algorithms for AR devices
SLAM review (4) Laser and visual fusion SLAM
Kimera real-time reconstruction semantic SLAM system
SLAM review (3) – Visual and inertial, visual and deep learning SLAM
Scalable SLAM framework – OpenVSLAM
Challenges in unstructured road LiDAR SLAM by Gao Xiang
Introduction to SLAM methods based on fisheye cameras
If you are interested in this article, please send “Knowledge Circle” in the background to obtain the QR code. Be sure to note “Name + School/Company + Research Direction” when joining the free knowledge circle to download PDF documents for free and communicate with more like-minded sharing enthusiasts!
If there are any errors in the above content, please leave a comment for correction and communication. If there is any infringement, please contact us to delete.
Scan the QR code
Follow us
Let’s share and learn together! We look forward to having enthusiastic sharing partners join the free circle to inject fresh energy into sharing. The themes of sharing include but are not limited to 3D vision, point clouds, high-precision maps, autonomous driving, and related fields such as robotics.
Sharing and cooperation methods: WeChat “920177957” (please note as required) Contact email: [email protected], companies are welcome to contact the public account for cooperation.
Click “Looking” and you will look better!