Trends in CMOS Image Sensor Technology Development

1. Introduction

CMOS Image Sensors (Complementary Metal-Oxide-Semiconductor Image Sensor) have gained dominance in the image sensor market due to their significant advantages such as low power consumption, low cost, high integration, and mass production capabilities. They are widely used in various fields including digital cameras, smartphones, surveillance cameras, automotive electronics, and medical imaging. In recent years, with the continuous advancement of semiconductor manufacturing technology, the performance of CMOS image sensors has significantly improved, achieving breakthroughs in key metrics such as resolution, dynamic range, and low-light imaging capabilities. At the same time, this technology is continuously expanding into high-end application areas such as autonomous driving, machine vision, and artificial intelligence vision systems, strongly promoting the vigorous development of related industries. Technological innovation has always been the core driving force for the advancement of the CMOS image sensor industry. Looking back over the past two decades, its structure has undergone tremendous changes. From the initial basic architecture to the widespread application of Backside Illuminated (BSI, Backside Illuminated) technology, this innovation allows light to enter the photodiode directly, effectively avoiding metal wiring, thereby significantly enhancing the sensor’s light sensitivity and signal-to-noise ratio. Subsequently, Stacked (Stacked) technology emerged, which vertically stacks the image sensor chip with the processing chip, utilizing Through-Silicon Via (TSV, Through-Silicon Via) technology to greatly reduce signal transmission delay, further enhancing processing speed and overall performance. Currently, the latest Triple Stack architecture has emerged, continuously bringing higher performance and broader application space to CMOS image sensors. Looking ahead to 2025 and beyond, Sequential Stack technology is expected to become a reality, further exploring the potential of image sensors in terms of functionality, performance, and integration through more complex multi-layer integration. It is foreseeable that competition around stacking technology will become increasingly fierce, and various CIS manufacturers will continue to increase R&D investment to promote innovation and development in this field. In addition to continuous breakthroughs in stacking technology, new imaging methods are gradually expanding the application boundaries of CMOS image sensors. With the rapid development of emerging technologies, the market competition landscape is undergoing profound changes. Beyond the traditional competition of pixel size and resolution, cutting-edge technologies such as Short-Wave Infrared (SWIR, Short-Wave Infrared) imaging and event-based imaging are gradually becoming new battlegrounds for enterprises.

2. Evolution of Pixel Size and Resolution

1. Trends and Challenges in Pixel Size Miniaturization

For a long time, pixel miniaturization has been an important direction for the development of CMOSimage sensors. In the early days, pixel sizes could reach 5.6μm, and now breakthroughs have been achieved with pixel sizes of 0.64μm or even smaller. During this process, the resolution has significantly increased from 0.3Mp to over 108Mp, and even 576Mp. However, pixel miniaturization is not without challenges, as it brings a series of tricky issues.

The first challenge is the reduction of signal-to-noise ratio (SNR, Signal-to-Noise Ratio). According to the basic relationship between signal and noise, the signal strength is proportional to the pixel’s photosensitive area. As the pixel size decreases, the photosensitive area shrinks, resulting in a corresponding decrease in the collected light signal, while the impact of noise increases, leading to a decline in image quality. In a simple model, assuming the current generated by the light signal is I_{signal}, which is proportional to the pixel area A, that is, I_{signal} = k_1A (k_1 is a proportional constant), and the noise current is I_{noise}, which can be considered a constant under certain conditions (simplified model, actual noise is influenced by various factors). Thus, the signal-to-noise ratio is SNR = rac{I_{signal}}{I_{noise}}=rac{k_1A}{I_{noise}} ext{.} It can be seen that as the pixel area A decreases, SNR decreases.

The light-gathering ability of microlenses also weakens due to pixel miniaturization, making it inefficient in focusing light onto the photosensitive pixels. When the size of the microlenses approaches the wavelength of visible light, the focusing efficiency for different wavelengths of light significantly decreases, resulting in energy loss. Additionally, the crosstalk issue between pixels becomes increasingly severe, with light interference between adjacent pixels greatly compromising the detail and color accuracy of the image. Photon and quantum efficiency may also degrade, as the full well capacity of small pixels is limited, reducing the stored photo-generated charge, which is particularly evident in low-light environments.

2. Technical Strategies to Address Pixel Miniaturization

To address these challenges, various technologies have been developed in the industry. For example, Deep Trench Isolation (DTI, Deep Trench Isolation) technology creates optical barriers by etching physical isolation walls several micrometers deep within each pixel to reflect oblique light and reduce crosstalk. However, at sub-wavelength scales, the diffraction phenomenon of light becomes significant, causing light diffracted at the edges of pixels to spread energy to adjacent pixels, making it difficult for traditional isolation structures to suppress this, leading to persistent color crosstalk issues. Low refractive index “fences” can suppress diffraction and color crosstalk. While these methods provide some degree of performance improvement for CIS , traditional solutions still have shortcomings for sub-wavelength pixels, leading to color crosstalk and signal loss. Therefore, sub-wavelength pixels still rely on solutions based on sub-wavelength structures. Companies like Samsung are exploring the use of super-optical technologies, such as nanoprism, which can act as color routers and lenses larger than pixels, to capture more light from adjacent color pixels, thereby improving sensitivity (+25%). Super-optical technology is still in the early stages of sensor applications but has shown other potential advantages, such as achieving extreme pixel scaling (0.22 micron pixel pitch) and improving color accuracy at 0.22 micron pixel pitch. To integrate more pixels within the compact size of high-resolution mobile cameras, pixel sizes are continuously shrinking to deep sub-micron levels, and pixel optical technologies are evolving to compensate for the inherent performance decline caused by pixel size reduction. From Front-Side Illumination (FSI, Front-Side Illumination), light guiding, Backside Illumination (BSI), Backside Deep Trench Isolation (DTI) to Full Depth DTI, the architecture is continuously evolving. Nevertheless, a one-micron pixel was once still insufficient for main cameras until the emergence of color filter array technology, such as four-pixel (2×2), which can provide full-resolution images through re-embedded image signal processing (ISP, Image Signal Processing) technology in bright conditions and brighter images in dark conditions. This technology continues to extend to four-pixel (4×4) configurations, achieving 0.5 micron deep sub-micron pixel scaling. The main challenge in maintaining reasonable sensitivity for sub-micron pixels comes from the diffraction limit of microlenses. Since the beam spot size does not scale with pixel scaling, traditional metallic color filter isolation grids interact more with incident light, for example, at 0.7 micron pixels, this is 32%, leading to optical losses. Non-metallic, low-refractive index dielectric-based grid technology addresses this issue and extends to air-gap grid (air-gap) technology.

3. Market Drivers and Technical Realization of Resolution Enhancement

Driven by market demand, the demand for high-resolution CMOS image sensors continues to grow. In the smartphone sector, consumers’ demands for photo quality are constantly increasing, prompting manufacturers to pursue higher pixel cameras. In the security monitoring field, the trend towards high definition and ultra-high definition is evident, and high-resolution image sensors help capture more details and enhance monitoring effectiveness. In the automotive electronics sector, with the development of autonomous driving technology, the precision requirements for environmental perception are increasing, and high-resolution cameras can provide richer visual information to assist vehicles in making more accurate decisions. From a technical realization perspective, in addition to increasing pixel count by reducing pixel size to enhance resolution, improvements in pixel architecture, signal processing algorithms, and other methods are also employed to ensure image quality at high resolutions. For example, improved active pixel sensor architectures, such as four-transistor (4T) and five-transistor (5T) architectures, further enhance image quality and response speed. Advanced image signal processing algorithms can optimize high-resolution images, reduce noise, enhance details, and correct colors. The demand for resolution in CMOS image sensors across different application fields and typical product parameters are shown in the table below:

Application Field

Typical Resolution Demand

Current Mainstream Product Resolution

Future Development Trends

Smartphones

64 million – 108 million pixels and above

Some flagship models have reached 108 million pixels

Continued enhancement towards higher pixels while optimizing imaging quality

Security Monitoring

4 million – 8 million pixels (HD), some ultra-high definition demands reach tens of millions of pixels

Mainstream products 4 million – 8 million pixels

HD, ultra-high definition, higher resolution to meet detail capture needs

Automotive Electronics

Front-facing cameras 1 million – 8 million pixels, surround view cameras are relatively lower

Some high-end models have front-facing cameras reaching 8 million pixels

With the improvement of autonomous driving levels, the resolution requirements for key cameras such as front-facing ones are increasing

Medical Imaging

Varies from million pixels to tens of millions of pixels depending on application scenarios

Different application products have significant resolution differences

Enhancing resolution to assist more accurate diagnosis while ensuring imaging quality

3. Improvement of Imaging Quality

1. Development of High Dynamic Range (HDR) Technology

High Dynamic Range (HDR, High Dynamic Range) imaging is crucial for CMOS image sensors as it allows images to retain more details in both bright and dark areas simultaneously. In many practical scenarios, lighting conditions are complex and variable, such as capturing objects in shadows under strong sunlight outdoors, where traditional image sensors are prone to overexposure in bright areas and underexposure in dark areas. However, HDR technology can effectively improve this situation. Currently, there are mainly two HDR technology routes: multi-frame exposure synthesis and single-pixel multi-well (Multi-Tap) technology. Multi-frame exposure synthesis technology achieves this by quickly capturing 2-4 frames of images with different exposure times in the same scene. Short exposure frames (e.g., 1/1000s) are used to capture details in bright areas to avoid overexposure; long exposure frames (e.g., 1/30s) are used to capture details in dark areas to reduce noise; and medium exposure frames (e.g., 1/120s) balance the intermediate brightness. Then, through image fusion algorithms (such as weighted fusion, threshold segmentation fusion), effective pixel information from each frame is extracted to synthesize an image with extended dynamic range, typically reaching 120 – 140dB. Single-pixel multi-well (Multi-Tap) technology integrates 2-3 charge storage wells of different capacities within a single pixel. During exposure, photo-generated charges preferentially fill the small wells, with the saturation voltage of the small wells corresponding to bright areas, avoiding overflow; once the small wells are saturated, excess charges automatically flow into the large wells through transfer switches, accumulating dark area charges and enhancing sensitivity. After exposure, the charge signals from the small and large wells are read separately, and the internal chip circuit calculates (e.g., small well signal × gain 1 + large well signal × gain 2), synthesizing a single frame HDR image, achieving “single-frame single-exposure”, with a dynamic range typically between 80 – 100dB. Both technologies have their advantages and disadvantages; multi-frame exposure synthesis technology has a higher dynamic range limit and retains more complete imaging quality details, but the imaging speed is slow, and it is prone to motion artifacts when capturing high-speed moving objects; single-pixel multi-well technology has a fast imaging speed, suitable for high-speed motion scenarios, but has a relatively lower dynamic range and high process complexity, increasing pixel area by 20% – 30%. The comparison of different HDR technologies in various aspects is shown in the table below:

Technology Type

Dynamic Range

Imaging Speed

Motion Artifacts

Process Complexity

Applicable Scenarios

Multi-Frame Exposure Synthesis

120 – 140dB

Slow (multi-frame capture + fusion has millisecond-level delays, e.g., 3 frames synthesis requires ≥30ms)

Prone to (motion during multi-frame capture can cause ghosting in fusion)

Low (no need to modify pixel structure, only requires additional algorithm processing units)

Static / Low-speed scenarios (e.g., smartphone HDR photography, landscape photography, static security monitoring)

Single-Pixel Multi-Well (Multi-Tap)

80 – 100dB

Fast (single-frame exposure, no delay, response time ≤10ms)

No (single-frame exposure, no offset in object position)

High (requires integration of multi-well isolation structures and charge transfer switches, increasing pixel area)

High-speed motion scenarios (e.g., automotive ADAS cameras, industrial high-speed detection, sports event recording)

2. Enhancement of Low-Light Imaging Capabilities

In low-light environments, CMOS image sensors face issues of weak signals and high noise, severely affecting imaging quality. To enhance low-light imaging capabilities, technological innovations are being made in pixel architecture design, material applications, and signal processing. In terms of pixel architecture, 4T active pixel sensors (4T APS) have significant advantages over traditional 3T active pixel sensors (3T APS). 4T APS adds transfer transistors (TX) between the photodiode (PD) and the floating diffusion region (FD), and during exposure, TX is turned off, physically isolating PD from FD to reduce substrate thermal excitation carrier injection into PD, thereby lowering dark current noise. At the same time, correlated double sampling (CDS, Correlated Double Sampling) technology is employed, which samples the pixel signal twice (once after reset, before exposure, and once after exposure) to cancel common noise (such as fixed pattern noise FPN, reset noise, power supply noise), reducing reset noise from 3T APS of 10e to below 1e. In terms of material applications, some new semiconductor materials are being explored to enhance low-light performance. For example, materials with high quantum efficiency can more effectively convert photons into electrical signals. In the signal processing stage, advanced noise reduction algorithms are widely applied. By analyzing and processing image data, noise signals can be identified and removed while retaining image details. Deep learning algorithms have also shown strong potential in low-light imaging noise reduction, as they can adaptively optimize images by learning from a large amount of low-light image data. The comparison of different pixel architectures in low-light imaging performance is shown below:

Pixel Architecture

Dark Current Noise Level

Reset Noise Level

Low-Light SNR (SNR)

Quantum Efficiency (QE)

3T APS

Higher, significantly affected by substrate thermal excitation carrier injection

About 10e

Lower, images tend to be blurry in low light

60% – 70% (in the 550nm visible light band)

4T APS

Lower, TX isolation reduces thermal excitation carrier injection

Can be reduced to 1e or below

Higher, clear imaging in low light, better detail retention

75% – 85% (in the 550nm visible light band)

3. Optimization of Color Reproduction and Accuracy

Accurate color reproduction is crucial for CMOS image sensors in various fields such as photography and medical imaging. Color reproduction mainly involves the design of Color Filter Arrays (CFA, Color Filter Array) and color correction algorithms. Color Filter Arrays typically use Bayer arrays or their derivatives, covering pixels with different color filters (red, green, blue) to capture color information. However, this approach has certain limitations, such as the green pixels contributing significantly to brightness while red and blue pixels are relatively fewer, leading to an imbalance in color information. To improve this situation, new designs for color filter arrays are continuously emerging, such as four-pixel (2×2) and four-pixel (4×4) merged color filter technologies, which optimize color output under different lighting conditions through re-embedded image signal processing techniques. In terms of color correction algorithms, the raw color signals output by the sensor are analyzed and corrected to compensate for color deviations caused by sensor characteristics, lighting conditions, and other factors. Using standard color cards and other tools for color calibration, color mapping models are established to adjust the color of each pixel in the image, achieving more accurate color reproduction. Some high-end CMOS image sensors also consider human visual characteristics, employing human perception-based color correction algorithms to make image colors more in line with human observation habits.

4. Integration and Multifunctionality

1. Innovations in Chip Stacking Technology

Chip stacking technology is an important means to enhance the integration of CMOS image sensors. From early simple stacking to the current Triple Stack architecture and potentially future Sequential Stack technology, the technology is continuously evolving. Taking the Triple Stack architecture as an example, it vertically stacks the image sensor chip with the processing chip, utilizing Through-Silicon Via (TSV) technology to achieve electrical connections between chips. This structure significantly reduces signal transmission delays, enhancing processing speed and performance. For instance, Sony’s triple-stacked CMOS image sensor is used in the Sony Xperia 1 V smartphone and adopted by other mainstream smartphone models, improving photo and video quality. This triple-stacked architecture also supports multi-modal sensing and on-chip artificial intelligence (AI), marking a shift of CMOS image sensors from merely pursuing resolution to intelligent sensing. In the future, Sequential Stack technology is expected to further enhance integration, incorporating more functional modules into a single chip, such as stacking more image processing units, storage units, etc., further enhancing the performance and functional extensibility of the sensor. The comparison of different stacking technologies in terms of performance and applications is shown below:

Stacking Technology Type

Signal Transmission Delay

Processing Speed Improvement

Integrable Functional Modules

Typical Application Scenarios

Simple Stacking

Has some delay

Relatively limited improvement

Basic image sensing and simple processing functions

Early smartphone cameras, some mid-to-low-end security monitoring

Triple Stacking

Significantly reduced

Significant improvement, up to several times

Supports multi-modal sensing, on-chip AI and other complex functions

High-end smartphones, professional imaging devices, some industrial inspections

Sequential Stacking (Future Trend)

Expected to further reduce

Further significant improvement

More image processing, storage, and other units integrated

Applications with extremely high performance and functional extensibility requirements, such as future high-end autonomous driving and advanced medical imaging

2. Fusion with Other Functional Modules

CMOS image sensors are increasingly being integrated with other functional modules to achieve multifunctionality. In the automotive electronics field, they are fused with sensors such as radar and LiDAR to form multi-sensor fusion systems, enhancing the vehicle’s perception accuracy and reliability of the surrounding environment. By integrating visual information captured by cameras with distance information from radar and three-dimensional point cloud information from LiDAR, it is possible to more comprehensively and accurately identify obstacles, vehicles, pedestrians, and other target objects on the road. In the smart home field, CMOS image sensors can be integrated with environmental sensors (such as temperature and humidity sensors) and microphones to achieve comprehensive perception of the home environment. For example, cameras can monitor human activity, environmental sensors provide temperature and humidity information, and microphones collect sound, which can be processed together to achieve various functions such as intelligent security and environmental adjustment. In medical devices, CMOS image sensors can be integrated with biosensors to monitor and visualize biological signs. For instance, in some wearable medical devices, integrating image sensors with heart rate sensors and blood oxygen sensors can not only capture images of the skin surface but also monitor physiological parameters such as heart rate and blood oxygen levels in real-time, providing richer data for medical diagnosis.

Leave a Comment