From Novice to Expert in Autonomous Driving: Understanding Camera ISP

1. Definition and Function of ISP

2. Components of ISP

3. Role and Workflow of ISP

4. Challenges and Development Trends of ISP

1. Definition and Function of ISP

The camera mainly consists of a lens, photosensitive elements, filters, an Image Signal Processor (ISP), and image transmission interfaces, as mentioned earlier.

From Novice to Expert in Autonomous Driving: Understanding Camera ISP

The ISP is the core component of the camera, primarily responsible for post-processing the raw signals output by the front-end photosensitive elements (CMOS/CCD). Through a series of complex algorithms, it processes image signals in real-time, ultimately outputting high-quality images. It is a key link in the camera imaging process and has a decisive impact on image quality. The functions of the ISP encompass all aspects of image processing, from initial signal processing to final image optimization, including automatic exposure (AE), automatic white balance (AWB), auto-focus (AF), bad pixel correction, noise reduction, strong light suppression, backlight compensation, color enhancement, lens shading correction, image cropping, color space conversion, and image stabilization.

The architecture of the ISP can be mainly divided into two types: external ISP and built-in ISP:

External ISP: An ISP chip is separately placed outside the Application Processor (AP) for image signal processing.

Built-in ISP: It is packaged inside the AP and closely integrated with it. In the fierce market competition, external ISP manufacturers that have survived typically have profound expertise in this field, accumulating rich experience in image quality debugging, and can provide better performance and results than built-in ISPs. Therefore, selecting high-quality external ISPs can provide professional and excellent image quality. The selection of external ISPs is generally not affected by the AP, allowing for the selection of the most suitable devices from a variety of excellent ISP chip suppliers to design more outstanding products.

However, external ISPs also mean higher costs.

2. Components of ISP

The ISP includes firmware and the logic part (software) running on it. It internally contains a Central Processing Unit (CPU), functional modules (SUB IP), image transmission interfaces (IF), and can be considered an independent System on Chip (SoC).

CPU: The central processing unit can run various image processing algorithms such as AF (auto-focus), LSC (lens shading correction), and control peripheral devices. Modern ISPs generally have CPUs based on the ARM architecture, such as the ARM Cortex-A series, with Cortex-A5 and Cortex-A7 being common models suitable for entry-level smartphones, low-cost phones, and smart mobile devices. The ISP’s demand for real-time image processing requires the CPU to have high performance and low power consumption.

SUB IP: This is a general term for various functional modules that process images in their respective specialties. Common SUB IPs include DIS (Digital Image Stabilization), CSC (Color Space Conversion), and VRA (Visual Recognition Algorithm).

IF: The image transmission interface is mainly divided into two types: parallel ITU and serial CSI. In mobile camera and vehicle integrated machine fields, the MIPI-CSI interface is widely used to transmit image data and various custom data. MIPI CSI stands for Mobile Industry Processor Interface Camera Serial Interface, a high-performance, low-power, low-cost serial communication interface standard developed by the MIPI Alliance. MIPI CSI is mainly used to connect camera modules and processors, supporting high bandwidth data transmission. It includes application layer, protocol layer, and physical layer. The physical layer defines the transmission medium, electrical characteristics of input/output circuit signals, and clock mechanisms, with the main physical layer protocols being D-PHY, C-PHY, and A-PHY. Among them, D-PHY and C-PHY are mainly used for short-distance communication (e.g., a few centimeters, generally board-to-board communication), while A-PHY is suitable for long-distance communication (e.g., over ten meters) and is applied in automotive fields.

External ISPs generally include both MIPI-CSIS (Camera Serial Interface Sender) and MIPI-CSIM (Camera Serial Interface Master) interfaces, while built-in ISPs typically require only the MIPI-C-SIS interface.

Additionally, the ISP also includes general-purpose peripheral devices such as I2C (Inter-Integrated Circuit), SPI (Serial Peripheral Interface), PWM (Pulse Width Modulation), UART (Universal Asynchronous Receiver/Transmitter), and WATCHDOG.

The I2C controller is used to read OTP (One Time Programmable) information and control the VCM (Voice Coil Motor). For external ISPs, the ISP itself is still an I2C slave device, and the application processor (AP) can control the ISP’s operating mode and obtain its operational status through I2C.

3. Role and Workflow of ISP

The main role of the ISP is to post-process the signals output by the front-end image sensor, and the general workflow is as follows:

1. Obtaining Raw Data from the Sensor: The image sensor (usually CMOS or CCD sensor) first captures the raw data of the image. This data is usually noisy, lacks color information, and has low contrast.

2. Data Preprocessing: The raw image data undergo preliminary processing, such as signal amplification and analog-to-digital conversion, in preparation for subsequent image processing stages.

3. Image Processing: The ISP performs complex processing on the raw data, including noise reduction, color correction, demosaicing, dynamic range optimization, sharpening, etc., ultimately producing a visually optimized image.

4. Output and Display: The processed image can be transmitted to a display or stored in a storage medium.

Black Level Correction (BLC) The physical device cannot be ideal; due to impurities, heat, and other factors, even without light exposure, the photosensitive device will generate charges, resulting in dark current. Moreover, it is challenging to distinguish between the dark current and the charges generated by light exposure.

The Black Level is used to define the signal level corresponding to image data being 0. Due to the influence of dark current, the actual raw data output from the sensor (data not being 0) is not the black balance we need. Therefore, to reduce the influence of dark current on the image signal, an effective method is to subtract the reference dark current signal from the obtained image signal. Generally, in the sensor, the actual pixels are more than the effective pixels, and the pixel area’s first few rows serve as a non-sensitive area (this part of the area also has an RGB color filter) used for automatic black level correction, with its average value serving as the correction value, and then the pixels in the following area are all subtracted from this correction value, achieving black level correction. Images without black level correction tend to be brighter, affecting the contrast of the image.

Linear Correction: Corrects the non-linear signals output by the sensor to ensure that the image signal is linearly related to the actual light intensity, ensuring imaging accuracy.

Noise Reduction: Noise can be caused by various factors, such as sensor noise, transmission errors, or environmental interference. The image processor can reduce or eliminate noise components in the image, making the final output clearer and richer in detail. Common noise reduction methods include:

Gaussian Filtering: A smoothing linear filter that removes noise through weighted averaging but can lose some edge and texture details.

Median Filtering: A statistical sorting filter that removes noise by taking the median of neighboring pixels, suitable for handling discrete point noise but may damage image detail and texture.

P-M Equation Denoising: An image denoising method based on the heat conduction equation that can remove Gaussian noise while protecting edges from smoothing.

TV Method Denoising: Based on total variation theory, it can remove Gaussian noise and isolated point noise while protecting image edges and details.

Some ISPs also integrate machine learning-based denoising algorithms to intelligently handle complex noise scenarios.

Lens Shading Correction (LSC)

Due to the optical construction of the lens, light entering the sensor is more concentrated in the center and less towards the edges, resulting in the digital image appearing brighter in the center and darker around the edges (shading). The goal is to achieve consistent brightness between the edges and the center of the image after correction. Correction algorithm principle: Determine the shading distribution and use polynomial fitting or concentric circle compensation methods for correction. Lens shading has two forms: Luma shading, also known as Vignetting, refers to the phenomenon where the brightness at the edges of the image darkens due to the light amount from the center gradually decreasing towards the edges. Chroma shading refers to the phenomenon where the refractive index of the lens differs for different wavelengths of light, causing separation of the focal plane position and resulting in pseudo-color in the image.

Automatic Exposure (AE) Automatically adjusts exposure parameters based on the strength of external light to prevent overexposure or underexposure, achieving the best image quality. Exposure amount refers to the total amount of light acting on the photosensitive element (such as film or sensor), determined by the following three factors:

Aperture: The size of the aperture blades inside the lens affects the light-gathering area. The larger the aperture (smaller the number), the more light passes through in a unit time.

Shutter: Determines the exposure time, i.e., the time the photosensitive element is exposed to light. The faster the shutter speed, the shorter the exposure time; conversely, the longer the exposure time.

ISO: Sensitivity, which measures how sensitive the film (or photosensitive element) is to light. The higher the ISO value, the higher the sensitivity to light, but it may also lead to increased noise.

In the automatic exposure system, the camera measures the brightness of the scene through a metering system and calculates the appropriate exposure amount based on the exposure equation, then automatically adjusts parameters such as aperture, shutter speed, and ISO to achieve that exposure amount. The exposure loop algorithm is the most common automatic exposure algorithm, implemented through a closed-loop control system, which includes:

Initial Exposure Estimate Calculation: Estimates the initial exposure value based on the current scene brightness by collecting image data through the sensor.

Exposure Adjustment: Adjusts shutter speed, aperture, and ISO settings based on the initial estimate result.

Feedback: Takes a photo and calculates its brightness, feeding back into the algorithm for exposure adjustment.

To achieve accurate automatic exposure, cameras typically provide various metering modes, including:

Average Metering (or Segmented Metering): Divides the viewfinder image into several areas and calculates the weighted average of each area to determine the exposure amount.

Partial Metering: Measures the brightness of a specific area of the scene to determine the exposure amount based on that area’s brightness.

Spot Metering: Measures a very small area, usually used for close-ups or scenes requiring precise control of exposure.

Center-Weighted Average Metering: Emphasizes the brightness of the central area of the viewfinder while considering the brightness distribution of the entire scene to determine the exposure amount.

Automatic White Balance (AWB) Automatically adjusts the color balance of the image through algorithms to restore white under different color temperatures of ambient light to true white (typically the white observed under natural daylight conditions).

Its principle is based on color constancy, i.e., if the reflective properties of an object’s surface do not change with lighting conditions (most objects meet this condition), then the ratio of the surface’s brightness to the ambient light brightness is also constant. The camera or digital camera uses built-in sensors and algorithms to automatically detect the color temperature of the ambient light and adjust the relative gains of the red, green, and blue signals so that the output of the three primary color voltages is equal, thereby reproducing standard white on the screen.

Demosaicing is used to convert the raw data captured by the image sensor (CMOS/CCD sensor) into a complete color image. Since the color filter array (CFA) used by the image sensor is typically single, with each pixel capturing only one color (red, green, or blue), demosaicing is needed to restore the complete color image and reconstruct the RGB values for each pixel. Demosaicing algorithms can be divided into two categories: linear interpolation methods and nonlinear interpolation methods. Depending on different strategies, the complexity and effectiveness of the algorithms can vary. Linear interpolation methods estimate the missing color values by taking the weighted average of neighboring pixels, usually simple but fast in computation.

Nonlinear interpolation methods estimate the missing color values using more complex algorithms, which can perform better in denoising, sharpening, and detail preservation. In recent years, deep learning methods have made significant progress in image demosaicing. By training deep neural networks (DNNs) to learn the demosaicing process, they can better reconstruct missing color information. The main challenge of demosaicing is to use the surrounding neighborhood information to infer the missing color values at each pixel while preserving image details, avoiding color distortion, and preventing excessive smoothing. Demosaicing algorithms need to find a balance in denoising, sharpening, and detail preservation.

Sharpening is the process of compensating for the contours of an image, enhancing the edges and gray transitions to make the image clearer. Sharpening can be divided into spatial domain processing and frequency domain processing.

Dynamic Range Optimization refers to the brightness range between the brightest and darkest parts of the image. Dynamic range optimization aims to improve the brightness distribution of the image, making it more suitable for human observation or subsequent processing. Common methods include:

Smoothing Processing Method: Averages neighboring points of the image, similar to video filtering, but this method does not reduce background noise.

Average Method: Reduces noise power by taking the average value through multiple scans, thereby improving dynamic range.

Reducing Mid-frequency Bandwidth: Filters out noise by lowering the bandwidth of the digital filter, thereby reducing background noise and improving dynamic range.

4. Challenges and Development Trends of ISP

With the continuous improvement of ISP chip performance and the deep integration of AI technology, AI-ISP combines artificial intelligence technology with traditional ISP image processing capabilities, providing stronger image processing effects. By introducing deep learning and neural networks, ISPs will achieve more advanced functions, such as smarter auto-focus, multi-frame synthesis in complex scenes, and automatic parameter adjustments under dynamic scene recognition. AI-ISP can perform pixel-level intelligent processing, enhancing image clarity, noise reduction effects, and dynamic range expressiveness.

However, the development of AI-ISP also faces several challenges. The first major obvious challenge is computing power. As image resolution increases and processing complexity rises, the computing power required by the ISP is also continuously increasing. Currently, many edge-side chips struggle to meet the computing power demands of AI-ISP, which becomes a significant challenge.

The second challenge is image quality detection. Currently, AI-ISP algorithms typically use dataset-based model training, but obtaining datasets and judging image quality are complex and difficult issues. Currently, there is no unified image quality detection standard, which affects the assessment of training model effectiveness.

The third challenge is complex application scenarios. Different application scenarios have varying requirements for image processing, such as high frame rate video surveillance requiring image clarity and stability, while drones need to process depth information and object recognition under low power conditions. ISPs need to adapt to these diverse needs to enhance product competitiveness.

In the future, the development of AI-ISP will mainly focus on the following aspects:

Addressing Extreme Scene Issues: Continuing to improve the performance of AI-ISP in nighttime, foggy, and high dynamic range scenarios, ensuring stable output of image quality under complex lighting conditions.

Technical Standardization: With the rapid development of AI-ISP technology, gradually forming unified industry standards to help developers better implement technical docking and compatibility across different devices and platforms.

Expanding Edge-side Usage Scenarios: By optimizing AI-ISP effects, further expanding its usage frequency in common scenarios, reducing implementation difficulty, and improving the differences in effect with traditional ISPs to meet a broader range of real-time processing needs.

Related posts

Leave a Comment Cancel reply