Optimization Diagnosis Strategies for Automotive SoC Embedded Memory

Abstract

The embedded memory in automotive System-on-Chip (SoC) typically occupies a significant area of the chip. Therefore, defects in these memories can severely impact the production yield of any autonomous driving device. With the advancement of technology and statistical process control during mass production, in addition to pure test data, collecting diagnostic information is also a good practice in the automotive industry. Designers and technical experts must obtain accurate diagnostic results from faulty devices to derive correct maintenance strategy conclusions by identifying and correcting related issues at the source and responding to erroneous behaviors. A common approach is to generate a fault map based on the coordinates of all faulty bits and send it to testers one by one. More effectively, the encountered faults can be retrieved.

This paper presents a method for compressing diagnostic information during the testing of SoC embedded memory. More specifically, this method is applied to diagnose embedded FLASH memory. This strategy allows for the reconstruction of fault maps without any loss, while the compression method achieves an approximation. The proposed method uses only a small portion of memory required by coordinate-based bit mapping methods and is comparable to compression methods. At the cost of moderate test time overhead, the proposed strategy allows for a significant increase in the number of devices that can be fully diagnosed without any loss in bitmap reconstruction. In a real embedded FLASH production scenario, most faulty devices are diagnosed after a single transfer from chip to test host.

I. Introduction

Embedded memory integrated into modern automotive microcontrollers occupies a significant portion of chip area. For this reason, they have a substantial impact on yield, and therefore, considerable effort is invested in their testing and repair processes. Their failures and “failure history” must be studied to diagnose issues during the production phase.

The purpose of testing is to ensure that each commercialized device operates perfectly according to expected specifications. Many considerations must be taken into account when designing test steps, including early failures shown on bathtub curves and aging effects that affect circuit physical parameters.

Software testing is a commonly used feasible solution in the industry, but it is well known for its particularly slow speed. As part of the test design work, specialized hardware has been developed, as described in [3][4][5][6][7], to improve this issue. A standard approach is to implement a hardware Built-In Self-Test (BIST). This hardware, executed directly on the chip, can test many internal components that are impractical or impossible to achieve using external testing tools or software methods.

For embedded FLASH (eFLASH) testing, a memory requires multiple erase, program, and verify operations to assess whether faults exist. During characterization and enhancement phases, another important concept is to collect test data and add it to pure testing, enabling manufacturers to continuously feed diagnostic information back to technical experts and designers. Such a precise process can improve yield and profitability.

A widely adopted solution for reporting faults is the bit fault map representation, where bit behaviors are reported, i.e., creating a matrix representation of a memory that shows each bit, marking it as 0 if functioning normally and 1 if a fault is detected. This technique is memory-intensive and time-consuming because it requires complex and time-consuming communication between the Automated Test Equipment (ATE) and the Device Under Test (DUT). For these reasons, bitmaps are rarely used in production environments unless for statistical process control reasons during mass production to compress the information collected along the tests. A solution that minimizes the cost of bitmaps includes on-chip bitmap compression.

This paper proposes an innovative approach based on data encoding and coloring concepts to collect and compress diagnostic information for eFLASH memory. This on-chip approach achieves high data compression with minimal impact on speed.

In our device (Aurix TC39xB), eFLASH testing is conducted through a composite on-chip mode consisting of a programmable hardware BIST and a CPU. When BIST applies test stimuli, the CPU coordinates the entire process, including receiving and expressing fault information from BIST to produce encoded fault maps. By carefully utilizing this setup, significant memory savings can be achieved, and the test time overhead is also acceptable.

This paper is organized as follows: In Section II, the BIST architecture used for eFLASH testing is briefly explained, and the eFLASH testing process is analyzed to understand the main sources of diagnostic information. Section III explains in detail the process from failed coordinates to creating basic information structures. Section IV presents experimental results from over 1800 real-case bitmaps collected during the production phase. Section V provides some conclusions.

II. Background

A. Embedded Memory Structure

In a typical embedded memory, bits that compose a matrix are organized by rows (called word lines) and columns (called bit lines). Each word line is further divided into pages containing a certain number of bits. Pages represent the minimum granularity of memory, composed of a certain number of bits. A single memory unit composed of a certain number of word lines and bit lines is called a physical sector. Finally, higher-level structures are formed by multiple physical sectors. It is also important to mention a common memory organization called scrambling, which consists of multiplexing and mirroring bits, as detailed in [3]:

● Multiplexing: Bits with the same index are physically adjacent in the word line.

● Mirroring: Word lines are mirrored about an intermediate point.

Figure 1 shows a visual representation of a 16-bit memory organized in a 4-bit word. Physically, it implements a multiplexing factor of 4 and mirrors every 2 scrambled bits.

Optimization Diagnosis Strategies for Automotive SoC Embedded Memory

Figure 1. Memory organization and fault details received from BIST

B. Architecture for Diagnosis

The embedded memory diagnosis described by Landzberg et al. is the most straightforward structure. This work proposes an ATE-based approach that can directly access the tested memory, retrieving the fault coordinates or the set of coordinates stored on-chip as soon as they appear. This approach does not process any of the collected data, and series of failures can be reconstructed from the entire set of coordinates.

In contrast, Schanstra et al., Chen et al., and Bernardi et al. have proposed variations by utilizing additional hardware that integrates memory testing capabilities and supports on-chip bitmap collection. Schanstra et al.’s method uses a modified BIST architecture and extends it to perform shape recognition. The described BIST identifies and compresses shapes such as failed bit lines or word lines. During this compression process, some faults may be lost, so this technique does not produce an accurate bitmap representation. Chen et al. proposed a compression method to reduce the number of bits required to reconstruct fault clusters; this reduction is at the cost of low accuracy in reconstructing clusters. Bernardi et al. combine integrated BIST with the CPU of their device to compress the fault coordinates found in their tests. BIST reports the coordinates of each faulty bit. The CPU then compresses these addresses by effectively searching the cubes of the Karnaugh map to utilize don’t-care values. This method limits the amount of communication between ATE and DUT.

III. Proposed Method

The proposed method is based on the concept of encoding to create compact fault maps on-chip. By utilizing a composite testing architecture, the bitmap information is stored in encoded or “colored” segments, which we refer to as “slices,” and is updated as testing proceeds.

The proposed compression method guarantees high accuracy, similar to [8]. In contrast, it returns approximate information due to compression, as shown in Table I. Regarding memory demands, the proposed method requires fewer memory resources than those required by [8], while being slightly more than the method in [5] when using the minimum compression ratio. The proposed method operates on-chip and is capable of downloading complete information at the end of testing, as done in [8], and possibly in the method of [5], which was initially implemented through additional hardware and tester capabilities.

Figure 2. Organization of CPU and programmable BIST

The proposed bit mapping pattern is supported by a suitable hardware-software design, where a programmable BIST can be accessed directly from the CPU, as shown in [9] and [10]. Figure 2 illustrates how the flash memory design for testing operates. The CPU activates selected processes for the programmable BIST to run, then waits for fault events. When a fault is encountered, BIST stops and displays a flag. Once the CPU notices this defect via polling actions, it can access the data, resume BIST operation, and perform some calculations. These on-chip calculations may involve allocating some redundant elements for repair algorithms and bit mapping algorithms as described in this paper. Figure 3 shows the execution of a golden fault-free test. Here, after the initial phase, BIST independently tests the entire embedded flash memory within a reference time known as “tgold.”

Figure 3. Execution of golden test

Figure 4 shows a different situation. Here, BIST finds a fault and stops, waiting for the CPU to read it and resume its operation. At this point, the CPU and BIST can work independently in an interleaved manner. Therefore, while BIST is busy testing other parts of the memory, the CPU can analyze the discovered faults and run the bit mapping algorithm or the coloring algorithm proposed in this paper.

Figure 4. Fault bits and interleaved CPU and BIST operation testing

The overall test time increases, now it is “tfaulty,” which is less than the total sum of the single time components of the entire system (e.g., tfaulty is less than the total of tgold, tread, and tencoding). When calculating the need to respond to fault occurrences, such a folding method is very advantageous for saving test time. In our case, we take advantage of this possibility to incrementally encode the bitmap information stored on-chip. Each time PBIST returns a fault, the encoding algorithm is executed, and the current bitmap information is updated.

A. Proposed Encoding Strategy

The goal of the proposed method is to produce an on-chip and dynamic encoded representation of the failed bitmap. The main objective of the method is to maximize the amount of information that can be accommodated by the pre-allocated on-chip memory. The on-chip memory constitutes a very strong constraint. Assume that the available memory resources have been exhausted before the end of testing. In this case, the bitmap will lead to incompleteness, or testers should intervene by downloading the current portion to restore testing and continue iteratively until testing is completed. Although the solution of multiple downloads seems feasible in theory, very few tester architectures support it, and it severely affects test time.

Therefore, the most feasible solution to save a large number of complete bitmaps is to compress them by encoding the information. Although this comes at the cost of the test time overhead caused by encoding calculations.

In our approach, we encode the bitmap information on-chip into “colored segments,” also referred to as “slices,” which are the basic structure of our compression algorithm. After carefully examining thousands of fault clusters, we chose segments instead of other types of shapes (i.e., rectangles). Faults are mostly arranged on word lines and bit lines, making segments the most effective and straightforward way to encode them. A segment represents one or more faults belonging to the same bit line or word line, with its format including:

● Indicating whether the segment is horizontal or vertical

● The physical coordinates of the first and last faults in that segment

● A color to describe the characteristics of the segment, considering the distribution of faults it covers.

For the proposed method, four colors have been introduced, as described in Figure 5 and explained below:

A) Black: A black segment includes a single fault.

B) Blue: Represents two faults that are far apart.

C) Red: Represents two or more faults in odd or even positions (one or more faults interleaved by working bits). This color is beneficial when applying a checkerboard pattern as the memory accurately tests its encoding pattern.

D) Orange: Two or more physically adjacent faults.

Figure 5. Fault shape to color representation

In Figure 5, the left part shows the actual bitmap, while the right reports the colored segments or slices.

Figure 6. Updating slices based on new input faults represented by blue dots

The aim of the proposed method is to create a set of slices that meet the fault clusters of our DUT. Such a set is established by the CPU on the fly, responding to new faults by updating the content of existing slices or initializing a new slice.

In Figure 6, an example is shown of how slices are updated when new faults arrive:

● First, a fault is received, i.e., A.1, and a black slice is created in A.2. In A.2, a new fault is received, causing the black slice in A.3 to be updated to a red slice. Similarly, a fault received in the middle of the red slice in A.3 leads to its update to the orange slice shown in A.4.

● First, a fault is received, i.e., B.1, and a black slice is created in B.2. In B.2, a new fault is received that causes the black slice in B.3 to be updated to a blue slice. In B.3, a fault discovered just below the blue slice leads to the creation of a black slice to encode the last fault in B.4.

B. On-chip Memory for Encoding Information

An important issue to consider is how the on-chip memory is structured. This relates not only to the potential storage capacity but also to the access time of memory information. In fact, the algorithm should be able to quickly check the information already contained to evolve the current encoded bitmap. In other words, the algorithm must search the current set of slices to see if there is an existing slice to update or to create a new black slice.

The proposed method aims to minimize the amount of information that needs to be stored and the time required for the algorithm to process a new fault. The memory organization resembles that used in caches, implementing a set associative method.

Given that the selected number of sets is N, the available memory is divided into N equal parts. When a new fault is recorded, its address and fault mask are retrieved by the CPU, which processes them to extract three parts:

● From the word line address, it can be seen

– The set index to which the slice belongs, such as: calculated address % N

– The normalized fault coordinates calculated from Address/N

● From the fault mask, a tag is extracted, which is then used to search, indicating the position of the bit in the fault mask.

Figure 7 illustrates how the output of PBIST is parsed with an example. Figure 8 completes the overview of memory construction, with a set count of N=32, the fault mask including 256 bits. According to the requirements of set associative construction, the memory is divided into N equally sized blocks. Once the set is calculated from the fault information, the correct memory part is accessed, and the tag is used to search for slices with the same tag value in the set. If such a slice already exists in the corresponding set, it will operate as described earlier. Conversely, if the current fault cannot be associated with any previously stored faults, a new slice will be stored.

Figure 7. Fault information analyzed by CPU when N=32

The illustrated method is very efficient in both search time and required bits. The division into sets can reduce the search time by a factor that depends on the number of sets N. The set value is not stored in the slices but can be inferred from the slice address in the on-chip memory using a reverse formula.

Figure 8. Organization of available memory for on-chip slice storage resembling cache

C. Choosing Horizontal or Vertical Encoding Direction

Of course, the fault mask may contain more than one faulty bit. In this case, the algorithm can create a vertical (bit line-oriented) or horizontal (word line-oriented) slice. While vertical coloring is easier and is done by considering one bit in the fault mask at a time, it is crucial to minimize its usage by quickly identifying horizontal shapes. Due to scrambling effects, which lead to faults distributed across multiple flash pages within a word line, determining that a segment is horizontal becomes challenging.

To address the trade-off between speed and accuracy in selecting vertical or horizontal direction, horizontal coloring is triggered when the algorithm “guesses” that a horizontal shape exists. This guess is based on the number of faults received on the current page; if it exceeds a given threshold, horizontal coloring is activated.

Figure 9 explains the mechanism for direction selection. Based on the number of faults in the fault mask, if their number is below the threshold, vertical coloring is immediately performed one by one or temporarily saved in a buffer for later coloring. In fact, if all pages arranged on the same word line are processed together according to the scrambling pattern, horizontal coloring becomes more efficient. Once a horizontal direction is taken, the temporary buffer will be updated with failure data from other pages that come from the same word line. The contents of the buffer are processed when the first fault that is no longer on the investigated word line is encountered. The created horizontal slices are stored in the corresponding memory set.

Figure 9. Flowchart for deciding vertical/horizontal encoding

IV.Experimental Results

This section presents the results obtained from the proposed algorithm in various research cases. The reference device is the Aurix TC39xB manufactured by Infineon Technologies. For this device, during the testing of operating system component sizes, the limit for storing bitmap information in RAM memory is set to 24KB. The available on-chip memory constraint is a key factor in evaluating the method.

In the following paragraphs, the proposed compression method will be compared with the advantages and disadvantages of using a bit-by-bit coordinate method or a compression method like that in [5]. This space is divided into 32 groups based on scrambling parameters. Configuration parameters, including a 256-bit fault mask and a 32-bit address, result in the proposed method having slices of 6 bytes in size. In contrast, the bit-by-bit method saves fault coordinates directly as 4-byte elements, while [5] uses shared bits between word lines and bit lines.

The subsequent experimental results indicate that the illustrated method guarantees stable average storage in terms of memory demands, meaning it can store more information than the reference bit-by-bit method when providing the same amount of memory. In other words, the proposed method can fully record more faulty devices compared to the bit-by-bit method. The method incurs a cost in increased bitmap generation time, which considering the occupancy advantage, appears sustainable.

Regarding the comparison with [5], the compression ratio during the experiments was 4480 times. This is the possible minimum resolution, requiring a fixed 20KB on-chip memory, making this method feasible on-chip under memory constraints.

At this point, the advantages and costs of the proposed method are first reported, along with four real and typical fault scenarios from production data. The comparison between the proposed method and [8], which retains a complete list of fault unit coordinates on-chip, is reported.

Then, a broader set of faulty devices, about 2000, accurately selected to constitute a large production sample, is considered. This part reveals that the speed of the bit-by-bit method is slightly faster, but limited by the available on-chip memory space. Meanwhile, this limitation is alleviated in the proposed method. The average accuracy of the bitmap reconstructed after compression in [5] is also compared, and a related index is calculated to assess the loss in accuracy compared to lossless methods.

A. Detailed Analysis of Some Typical Stages

The following figures are cropped bitmaps showing specific areas of faulty embedded flash memory. Each image has (A) the fault bitmap and (B) the corresponding representation returned by the algorithm.

Figure 10 shows a fault situation in the vertical direction. In this particular case, the eFlash is affected by 388 faults, with an overhead of 21ms compared to the golden execution (with good memory testing). The proposed method shows a relative time overhead of 46% compared to the bit-by-bit method, which requires 14.35ms. Despite the time loss, the proposed algorithm saves 95% of the required RAM space, i.e., approximately 78B, compared to the bit-by-bit method that requires 1.51KB.

The vertical case is one that aligns more closely with the proposed algorithm. In fact, PBIST requires some time to reach subsequent faults, allowing the CPU to greatly utilize this time to execute the algorithm.

Figure 10. Example of a vertical fault line orientation scenario

In the case of horizontal shape faults, as shown in Figure 11, larger time overhead is expected, as horizontal coloring comes from a fault mask that includes more than one faulty bit, requiring more time for calculations, and PBIST encounters faults with each consecutive read.

Figure 11. Example of partially failed word lines

In such cases, there are a total of 18229 faults, and the test time increased by 69% compared to the bit-by-bit method, while RAM memory savings were approximately 98.68%.

Figure 12. Example of sparse fault scenarios

Table II shows the comparison between the bit-by-bit method and the proposed method when changing the number of faults. It is noteworthy that the proposed method requires slightly more time and memory than the bit-by-bit method, as the sparsity of faults prevents clustering.

Sparse fault clusters represent another important scenario to observe. Due to limited compression possibilities, this cluster is the most challenging to handle. Figure 12 describes a case of a relatively dense memory matrix with sparse faults. Some of these are far apart but align on the same bit line or word line shape, leading to blue slices, which is a combination of previously considered situations. Despite the inherent difficulties, the proposed method shows limited losses compared to the bit-by-bit method. In the case shown in Figure 13, 9949 faults are included, and the time to collect diagnostic information is 680ms, while the bit-by-bit method requires 440ms. Conversely, memory usage dramatically decreases from 38.85KB for the bit-by-bit method to 0.1KB for the proposed method.

B. Results Achieved on a Larger Device Basis

Experimental measurements can also be used for a broader set of devices. We considered 1864 faulty devices from frontend wafer testing operations. Such a collection gathered many different shapes and was used for further evaluation of the advantages and costs of the proposed method.

Figure 13. Fault constellation diagram with working drill bits at intersections

We compare it with the bit-by-bit coordinate method and the compression method in [5].

Table III reports the number of devices for these methods that can be bitmap-rendered without exceeding the 24KB on-chip RAM limit. From this table, it can be seen how many devices are fully recorded within 24KB using only our method and the bit-by-bit coordinate method.

When the 24KB on-chip RAM is filled, some diagnostic information will be lost unless the testing and diagnostic process is interrupted, and the current bitmap is dumped to the tester to restore the memory testing program. Assume that the diagnostic collection is paused, and new faults are no longer recorded. In this case, previously encoded faults are preserved, and a partial fault cluster can be reconstructed at the end of the testing process, as shown in Figure 14. The selected population averages about 2000.

Figure 14. Fault constellation diagram partially reconstructed from a complete 24KB buffer

The selected population shows an average of about 2000 faults, with a variance of about 5000 faults. Observing the performance of the sample population, the overall time for testing and collecting diagnostic information averages 192ms, with a variance of 279ms. For the bit-by-bit method, these values average 152ms, with a variance of 189ms. Table IV involves the occupancy of bitmap size; it shows the percentage of smaller bitmap creation sizes in the investigated population, comparing the proposed method and the reference method. Considering all devices, the proposed method requires less memory in about 60% of cases. If only failure scenarios with over 250 faults are considered, the proposed method shows a scale advantage in about 91% of cases. Regarding the comparison with [5], the Pearson correlation index was calculated to measure the degree of difference from the proposed method. Restricting memory requirements to 20KB, [5] can store any fault cluster with relatively low accuracy; on average, the proposed method calculates a correlation index of 61%. Figure 15 shows the reconstructed fault cluster, a) compressed using the method shown in [5], b) compressed using the proposed compatible bitmap method. In this specific case, approximately 2000 faults are involved, with a correlation index of 83%.

Figure 15. Comparison between the bitmap reconstructed using the method shown in [5] (A) and the compatible bitmap compressed using the proposed method (B)

V. Conclusion

This paper presents an innovative algorithm that improves the collection of diagnostic information in eFLASH testing. Results obtained from real data indicate that the proposed method has advantages in terms of memory occupancy and speed. By using the developed algorithm, considering the substantial saving of memory, it is possible to permanently store a complete fault history of a device in a smaller memory, providing more details about the evolution of the bitmap along the testing steps for the analysis of faulty devices.

References:

[1] A. van de Goor, G. Gaydadjiev and S. Hamdioui, “Memory testing with a RISC microcontroller” in Proc. on Design, Automation and Test in Europe, Dresden, 2010.

[2] “IEC 61508-[1- 16],”Functional safety of electrical / electronic / programmable electronic safety-related systems, 2010.

[3] P. Bernardi et al. “Cumulative embedded memory failure bitmap display

& analysis” in IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems, 2010.

[4] S. Abhas, M. K. Gurram and A. Abhijit, “Controller Architecture for

Memory BIST Algorithms” in IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), 2020.

[5] J. Chen, J. Khare, K. Walker, S. Shaik, J. Rajsky and W. Maly, “Test response compression and bitmap encoding for embedded memories in manufacturing process monitoring” in Proceedings International Test Conference 2001.

[6] P. Bernardi et al. “An efficient algorithm for the extraction of compressed diagnostic information from embedded memory cores” in 2003 IEEE Conference on Emerging Technologies and Factory Automation.

[7] I. Schanstra et al. “Semiconductor Manufacturing Process Monitoring using Built-In Self-Test for Embedded Memories” in Proceedings International Test Conference 1998.

[8] A. L. Landzberg and R. Van Nostrand, Microelectronics Manufacturing Diagnostics Handbook, New York, USA, 1993.

[9] H. WonGi, C. JungDai and C. Hoon, “A programmable memory BIST for embedded memory” in International SoC Design Conference, 2008.

[10] C.-H. Tsai and C.-W. Wu, “Processor-programmable memory BIST for bus-connected embedded memories” in Proc. of the Design Automation Conference, 2001.

[11] P. Bernardi et al., “A Machine Learning-based Approach to Optimize Repair and Increase Yield of Embedded Flash Memories in Automotive Systems-on-Chip” in European Test Symposium, 2019.

END

Sharing is not easy, please give a 【👍】 and 【look】

Related posts

Leave a Comment Cancel reply