Understanding FPGA SerDes Interfaces

Welcome FPGA engineers to join the official WeChat technical group.

Clickthe blue wordsto follow us, FPGA Home – the best and largest pure FPGA engineer community in China.

FPGA has developed to this day, and SerDes (Serializer-Deserializer) has basically become standard equipment. From PCI to PCI Express, from ATA to SATA, from parallel ADC interfaces to JESD204, from RIO to Serial RIO, etc., all rely on SerDes to improve performance. SerDes is a very complex mixed-signal design, and the content of the user manual only describes a small tree in the forest and cannot explain how SerDes works. How can SerDes work without a transmission clock signal? What are emphasis and equalization? What is the relationship between jitter and bit error rate? What is the relationship between various types of jitter? This article attempts to understand how SerDes is designed from the perspective of a SerDes user. Due to limited knowledge, there must be inaccuracies, but I hope it helps engineers who are just starting to get in touch with SerDes.

Contents

1. The Value of SerDes… 1

1.1 Parallel Bus Interfaces… 1

1.2 SerDes Interfaces… 3

1.3 Intermediate Types… 4

2. SerDes Architecture… 4

2.1 Serializer/Deserializer… 6

2.2 Tx Equalizer… 8

2.3 Rx Equalizer… 9

2.4 Clock Data Recovery (CDR)… 13

2.5 Common Phase-Locked Loop (PLL)… 16

2.6 SerDes Codec… 18

2.7 SerDes Transceiver Driver and Differential Interface Conversion… 19

2.8 SerDes Loopback and Debugging… 19

3. Jitter and Signal Integrity (SI)… 19

3.1 Clock Jitter… 19

3.2 Data Jitter… 20

4. Signal Integrity (SI) and Simulation… 23

4.1 Channel… 23

4.2 Chip Package… 24

4.3 SI Simulation… 24

5. Conclusion… 25

1. The Value of SerDes

1.1 Parallel Bus Interfaces

Before SerDes became popular, interconnect between chips was done via system-synchronous or source-synchronous parallel interfaces. Figure 1.1 demonstrates system and source-synchronous parallel interfaces.

Understanding FPGA SerDes Interfaces

As interface frequencies increase, several factors limit the effective data window width in system-synchronous interfaces.

l Clock skew between the two chips

l Data skew between each bit in parallel data

l Skew between clock and data propagation delays. Although clock skew can be compensated in the destination chip using PLL, the amount of clock skew changes and data skew changes are not the same with PVT variations, further worsening the data window.

In source-synchronous interfaces, the sending side Tx sends the clock along with the data, limiting the impact of clock skew on the effective data window. Typically, the source-synchronous interface keeps the clock signal and data signal processed similarly, allowing them to travel the same path with the same delay. Thus, during PVT variations, clock and data will increase or decrease together, which is most favorable for skew.

Let’s make some reasonable assumptions. Assume a 32-bit data parallel bus,

a) Sending end data skew = 50 ps — a high requirement
b) PCB trace skew = 50ps — a high requirement
c) Clock period jitter = +/-50 ps — a high requirement
d) Receiving end trigger sampling window = 250 ps — Xilinx V7 high-end device IO trigger.

It can be roughly estimated that the maximum clock for the parallel interface = 1/(50+50+100+250) = 2.2GHz (DDR) or 1.1GHz (SDR).

Using source-synchronous interfaces, the effective data window can be significantly increased. Typically, frequencies are below 1GHz. In practical applications, the clock of the SPI4.2 interface can reach DDR 700MHz x 16 bits wide. DDR Memory interfaces are also a type of source-synchronous interface, and DDR3 can achieve about 800MHz clock in FPGA.

To increase the transmission bandwidth of the interface, there are two ways: one is to increase the clock frequency, and the other is to increase the data width. But can the data width be increased indefinitely? This involves another very important issue – synchronous switching noise (SSN).

We will not discuss the principle of SSN here, directly giving the SSN formula SSN = L * N * di/dt. L is the chip package inductance, N is the data width, di/dt is the rate of current change. As the frequency increases and data width increases, SSN becomes the main bottleneck for increasing transmission bandwidth. Figure 1.2 is an example of DDR3 crosstalk. The theoretical low-level value in the figure is 0V, but due to the influence of SSN, the low level exhibits oscillation, and the maximum value of oscillation noise reaches 610mV, thus the noise margin is only 1.5V/2-610mV=140mV.

Understanding FPGA SerDes Interfaces Figure 1.2 DDR3 Crosstalk Demonstration

Therefore, it is impossible to rely on unlimited increases in data width to continue increasing bandwidth. One solution to SSN is to use differential signaling instead of single-ended signaling, which can effectively solve the SSN problem at the cost of using more chip pins. However, using differential signaling still does not solve the data skew problem, and large width differential signaling combined with strict timing constraints presents significant challenges to parallel interfaces.

1.2 SerDes Interface

The clock frequency of source-synchronous interfaces has already encountered a bottleneck. Due to the non-ideal characteristics of the channel, further increasing the frequency will severely damage the signal, requiring the use of equalization and data clock phase detection technologies. This is the technology used by SerDes. SerDes (Serializer-Deserializer) is short for serializer and deserializer. The serializer (Serializer) is also known as the SerDes transmitter (Tx), while the deserializer is known as the receiver (Rx). Figure 1.3 is a demonstration of the interconnection of N pairs of SerDes transceiver channels, where N is generally less than 4.

Understanding FPGA SerDes Interfaces

As can be seen, SerDes does not transmit clock signals, which is also the most special aspect of SerDes. SerDes integrates CDR (Clock Data Recovery) circuits at the receiving end, extracting the clock from the edge information of the data and finding the optimal sampling position.

SerDes transmits data in a differential manner. Generally, multiple channels of data are placed in a group to share PLL resources, while each channel still operates independently.

SerDes requires a reference clock, which is generally also in differential form to reduce noise. The reference clock of the receiving end Rx and transmitting end Tx can allow frequency differences of several hundred ppm (plesio-synchronous system) or can be the same frequency clock, but there is no requirement for phase difference.

To make a simple comparison, a SerDes channel uses 4 pins (Tx+/-, Rx+/-), and current FPGAs can achieve speeds up to 28Gbps. In contrast, a 16-bit DDR3-1600 line rate is 1.6Gbps * 16 = 25Gbps, which requires 50 pins. This comparison shows the advantage of SerDes in transmission bandwidth.

Compared to source-synchronous interfaces, the main features of SerDes include:

l SerDes embeds the clock in the data line, eliminating the need to transmit clock signals.

l SerDes can achieve high-speed long-distance transmission through emphasis/equalization techniques, such as backplanes.

l SerDes uses fewer chip pins.

1.3 Intermediate Types

There are also some interface types that fall between SerDes and parallel interfaces. Compared to source-synchronous interfaces, these intermediate types also use serializers (Serializer) and deserializers (Deserializer), while also transmitting clock signals for synchronization. These interfaces include video display interfaces such as 7:1 LVDS.

2. SerDes Architecture

The main components of SerDes can be divided into three parts: the PLL module, the transmit module Tx, and the receive module Rx. For convenience in maintenance and testing, it also includes control and status registers, loopback testing, PRBS testing, and other functionalities. See Figure 2.1.

Understanding FPGA SerDes Interfaces Figure 2.1 Basic Blocks of a typical SerDes

The blue background sub-modules in the figure are the PCS layer, which is standard synthesizable CMOS digital logic and can be implemented with hard logic or FPGA soft logic, making it relatively easy to understand. The brown background sub-modules are the PMA layer, which are mixed-signal CML/CMOS circuits, key to understanding how SerDes differs from parallel interfaces, and are the focus of this article.

The flow of signals in the transmitting direction (Tx): The parallel signals from FPGA soft logic (fabric) are sent through the interface FIFO (Interface FIFO) to the 8B/10B encoder (8B/10B encoder) or scrambler (scrambler) to avoid data containing long sequences of zeros or ones. They are then sent to the serializer (Serializer) for parallel-to-serial conversion. The serial data is conditioned by the equalizer (equalizer) and sent out through a driver (driver).

The flow of signals in the receiving direction (Rx): The external serial signal is conditioned by a linear equalizer (Linear Equalizer) or DFE (Decision Feedback Equalizer) structure equalizer to remove some deterministic jitter (Deterministic jitter). CDR recovers the sampling clock from the data, and the deserializer converts it into aligned parallel signals. The 8B/10B decoder (8B/10B decoder) or de-scrambler (de-scrambler) completes the decoding or de-scrambling. If it is an asynchronous clock system (plesio-synchronous system), there should also be a resilient FIFO before the user FIFO to compensate for frequency differences.

PLL is responsible for generating the clock signals required for each SerDes module and managing the phase relationships between these clocks. For example, with a line rate of 10Gbps in the figure, the reference clock frequency is 250MHz. Serializer/Deserializer requires at least 5GHz 0-degree phase clock and 5GHz 90-degree phase clock, 1GHz (10-bit parallel)/1.25GHz (8-bit parallel) clock, etc.

A SerDes typically also needs debugging capabilities. For example, pseudo-random code stream generation and comparison, various loopback tests, control status registers, and access interfaces, LOS detection, eye diagram testing, etc.

2.1 Serializer/Deserializer

The serializer converts parallel signals into serial signals. The deserializer converts serial signals back into parallel signals. Generally, parallel signals are 8/10bit or 16/20bit wide, while serial signals are 1bit wide (it can also be serialized in stages, such as 8bit->4bit->2bit->equalizer->1bit to reduce the working frequency of the equalizer). Protocols that use scrambled data like SDH/SONET, SMPTE SDI use 8/16bit parallel widths, while protocols like PCIExpress, GbE use 10bits/20bits widths.

A 4:1 serializer is shown in the figure. 8:1 or 16:1 serializers use similar implementations. During implementation, to reduce the working frequency of the equalizer, the serializer first converts parallel data into 2bits, sends it to the equalizer to filter, and finally serializes it in a 2:1 manner; the following sections will explain based on 1bit serial signal.

Understanding FPGA SerDes Interfaces

A 1:4 deserializer is shown in Figure 2.3, and 8:1 or 16:1 deserializers use similar implementations. During implementation, to reduce the operating frequency of the equalizer (DFE based Equalizer), DFE operates in DDR mode, where the input to the deserializer is 2bits or wider; the following sections will explain based on 1bit serial signal.

Understanding FPGA SerDes Interfaces

Serializer/Deserializer operates in a double-edge (DDR) mode, using area to exchange speed strategy to reduce the proportion of high-frequency circuits in the circuit, thereby reducing circuit noise.

In addition to the deserializer, the receiving side generally also has alignment logic. Compared to the SerDes transmitter, the receiving side of SerDes starts working at an arbitrary moment, and the first bit correctly received by the receiver may be from any bit position of the transmitted parallel data. Therefore, alignment logic is needed to determine from which bit position to start to form the correct parallel data. The alignment logic determines the starting position of serial-to-parallel conversion by searching for alignment code words (Alignment Code) in the serial data stream. For example, the 8B/10B encoding protocol typically uses K28.5 (positive code 10’b1110000011, negative code 10’b0001111100) as the alignment word. Figure 2.4 demonstrates an alignment logic. By sliding the window and comparing bit by bit, it finds the position of the alignment code (Align-Code). After finding the alignment code at the same position multiple times, the state machine locks the position and selects the corresponding position to output aligned data. Understanding FPGA SerDes Interfaces

2.2 Tx Equalizer

The path that the SerDes signal travels from the transmitting chip to the receiving chip is called the channel, which includes chip packaging, PCB traces, vias, cables, connectors, and other components. From the frequency domain perspective, the channel can be simplified as a low-pass filter (LPF) model. If the SerDes rate exceeds the cutoff frequency of the channel, the signal will be distorted to some extent. The equalizer’s role is to compensate for the channel’s damage to the signal.

The transmitting side equalizer uses FFE (Feed Forward Equalizers), and the equalizer on the transmitting side is also referred to as an emphasis. Emphasis is divided into de-emphasis and pre-emphasis. De-emphasis reduces the swing of the differential signal, while pre-emphasis increases the swing of the differential signal. Most FPGAs use de-emphasis; the stronger the emphasis, the smaller the average amplitude of the signal.

The transmitting equalizer is designed as a high-pass filter (HPF), roughly the inverse function of the channel frequency response H(f), and the goal of FFE is to ensure that the signal arriving at the receiving end is clean. There are many implementation methods for FFE, and a typical example is shown in Figure 2.5.

Understanding FPGA SerDes Interfaces

Adjusting the coefficients of the filter can change the frequency response of the filter to compensate for different channel characteristics, which can generally be dynamically configured. Taking the 10Gbps line rate as an example, Figure 2.5 demonstrates the frequency response of DFE. It can be seen that for C0=0, C1=1.0, C2=-0.25 configuration, the high-frequency gain at 5GHz is 4dB higher than the low-frequency area, thus compensating for the channel’s attenuation of the high-frequency spectrum.

Understanding FPGA SerDes Interfaces

The sampling clock frequency limits this FFE to compensate only up to Fs/2 (in this example, Fs/2=5GHz). According to the sampling theorem, all the information in the serial data is contained within 5GHz; from this perspective, it is sufficient. If compensation is needed for frequencies above Fs/2, the FFE must operate above Fs, or a continuous-time domain filter (Continuous Time FFE) is required.

Figure 2.7 demonstrates the time-domain filtering effect of DFE; taking the 10Gbps line rate as an example, one UI = 0.1 nS = 100ps. The serial data stream demonstrated is binary [00000000100001111011110000].

2.3 Rx Equalizer

2.3.1 Linear Equalizer

www.blog.sina.com.cn/fpgatalk

The goal of the receiving equalizer is consistent with that of the transmitting equalizer. For low-speed (<5Gbps) SerDes, continuous-time domain linear equalizers are typically used, such as peaking amplifiers. The equalizer amplifies high-frequency components more than low-frequency components. Figure 2.8 shows the frequency domain characteristics of a linear equalizer. Typically, factories encapsulate equalization characteristics into several levels, which can be dynamically set to adapt to different channel characteristics, such as High/Med/Low.

Understanding FPGA SerDes Interfaces Figure 2.8 Frequency Response of A Peaking Amplifier based Rx Equalizer

2.3.2 DFE Equalizer (Decision Feedback Equalizer)

For high-speed (>5Gbps) SerDes, due to jitter (such as ISI-related deterministic jitter) may exceed or approach one symbol interval (UI, Unit Interval), simply using linear equalizers is no longer suitable. Linear equalizers amplify both noise and signals without improving SNR or BER. For high-speed SerDes, a non-linear equalizer known as DFE (Decision Feedback Equalizer) is used. DFE predicts the sampling threshold of the current bit by tracking the data (history bits) of multiple previous UIs. DFE amplifies only the signal and does not amplify noise, effectively improving SNR.

Figure 2.9 demonstrates a typical 5-order DFE. The received serial data is judged by a slicer to determine 0 or 1, then the data stream is filtered to predict inter-symbol interference (ISI), and the inter-symbol interference (ISI) is subtracted from the input original signal to obtain a clean signal. To ensure that the DFE equalizer operates within the electrical line range, the serial signal is first processed by a VGA to automatically control the amplitude entering the DFE.

To understand the working principle of DFE, let’s first look at the impulse response of a 10Gbps backplane. This backplane model is based on a measured model provided by MATLAB, exhibiting typical characteristics.

In Figure 2.10, one grid represents the time of one UI. It can be seen that a UI (0.1nS = 1/10GHz) pulse signal, after passing through the backplane, leaks into multiple adjacent UIs, causing interference to the data of other UIs. The interference behind the sampling point is called post-cursor interference, while the interference before the sampling point is called pre-cursor interference. The first coefficient of DFE h1 (0.175 in this case) corrects the first post-cursor, and the second coefficient h2 (0.075 in this case) corrects the second post-cursor. The higher the order of DFE, the more post-cursors can be corrected.

Using the above backplane to transmit a code stream of 11011, due to the leakage of post-cursor and pre-cursor, if there is no equalization, the ‘0’ cannot be recognized, see Figure 2.11. Assuming there is a 2-order DFE, the amplitude at the ‘0’ bit position should subtract the first ‘1’ bit’s h2 and the second ‘1’ bit’s h1, yielding 0.35-0.075-0.175 =0.1, which is sufficient to be recognized as ‘0’.

It can be seen that DFE calculates the post-cursor interference of historical bits and subtracts the interference from the current bit to obtain a clean signal. Since DFE can only correct post-cursor ISI, there is usually a LE (Linear Equalizer) before DFE. As long as the coefficients of DFE are close to the channel’s pulse response, relatively ideal results can be achieved. However, the channel is a time-varying medium, and factors like temperature, voltage, and process variations can change the characteristics of the channel. Therefore, the coefficients of DFE require adaptive algorithms to automatically capture and follow changes in the channel.

The DFE coefficient adaptive algorithm is highly academic, and each vendor’s algorithm is confidential and not disclosed. For NRZ codes, typical algorithm criteria are based on sign-error driven algorithms. Sign-error is the error between the amplitude of the equalized signal and the expected value, and the algorithm optimizes successively h1/h2/h3… to minimize the mean square error of sign-error. Since sign-error and sampling position are coupled and mutually influence each other, the DFE coefficients can also be predicted based on two criteria: sign-error and eye diagram width. Therefore, SerDes that adopt DFE structures usually come with built-in eye diagram testing circuits, as shown in Figure 2.9. The eye diagram testing circuit calculates the bit error rate (BER) by shifting the amplitude of the signal vertically and the sampling position horizontally, thus obtaining the relationship between each offset position and the bit error rate, resulting in an eye diagram, as shown in Figure 2.12.

Figure 2.12 SerDes Embedded Eye-Diagram Test Function

2.4 Clock Data Recovery (CDR)

The goal of CDR is to find the best sampling moment, which requires the data to have rich transitions. CDR has an indicator called Max Run Length tolerance (Max Run Length or Consecutive Identical Digits). If the data does not transition for a long time, CDR cannot obtain accurate training, and the sampling moment of CDR will drift, possibly sampling more 1s or 0s than the real data. Moreover, when the data resumes transitions, erroneous sampling may occur. For instance, some CDRs use PLL implementation; if the data stops transitioning for a long time, the output frequency of the PLL will drift. In practice, the data transmitted by SerDes either uses scrambling or encoding methods to ensure Max Run Length remains within a certain range.

l The 8B/10B encoding method can ensure Max Run Length does not exceed 5 UIs.

l The 64B/66B encoding method can ensure Max Run Length does not exceed 66 UIs.

l SONET/SDH scrambling methods can ensure Max Run Length does not exceed 80 UIs (BER<10^-12).

In point-to-point connections, most SerDes protocols adopt continuous mode (continuous-mode), where the data flow on the line is sustained without interruption. In point-to-multipoint connections, burst mode (burst-mode) such as PON is often used. Clearly, burst mode has strict requirements on the locking time of SerDes.

Continuous-mode protocols like SONET/SDH require tolerating long consecutive zeros and have strict requirements on the jitter transmission performance of CDR (due to loop timing).

If the receiving (Rx) and transmitting (Tx) are in asynchronous mode (asynchronous mode), or in spectral spreading (SSC) applications, CDR requires a wide phase tracking range to track the frequency difference between Rx/Tx.

Depending on the different needs of application scenarios, there are many architectures for CDR implementation. FPGA SerDes often adopts digital PLL-based CDR and phase interpolator-based CDR. These two types of CDR use digital filters in the loop, which saves area compared to analog charge pumps and analog filter structures.

Figure 2.13 is a phase interpolator-based CDR. The phase comparator array compares the input serial data with M equally spaced phase clocks over multiple UIs to obtain phase error signals across multiple UI spans. The frequency of the phase error signal is very high, and its width is also wide. After being down-sampled and smoothed, it is sent to the digital filter. The performance of the digital filter affects the loop’s bandwidth, stability, response speed, etc. The smoothed error signal from the digital filter is sent to the phase rotators to correct the clock phase. When the loop locks, the theoretical phase error is zero, and the clock shifted by 90 degrees is used as the recovered clock to sample the serial input.

Figure 2.14 is DPLL-based CDR, divided into two loops, one for data phase-locked loop (phase tracking loop) and the working principle of Figure 2.13 is similar. The phase comparator array compares the input serial data with M equally spaced phase clocks (which may also be over multiple UI spans) to obtain the phase error signal. The phase error signal is sent to the digital filter. The performance of the digital filter affects the loop’s bandwidth, stability, response speed, etc. The smoothed error signal from the digital filter is sent to the VCO to correct the clock phase. When the loop locks, the theoretical phase error is zero, and the clock shifted by 90 degrees is used as the recovered clock to sample the serial input.

The DPLL-based CDR has an added frequency tracking loop (Frequency Tracking Loop). This is to reduce the locking time of CDR and lessen the design constraints on the loop filter. Only when the frequency tracking loop locks will it switch to the data phase tracking loop. If the phase tracking loop loses lock, it will automatically switch back to the frequency tracking loop. The frequencies of the N times reference clock (Reference Clock) and line rate are close to equal, so the steady-state control voltages of the VCO for both loops are approximately equal. With the frequency tracking loop, the capture time of the phase tracking loop is reduced. When the phase tracking loop locks, the frequency tracking loop does not affect the phase loop. Therefore, the SerDes receiving side has no high requirements on the jitter of the reference clock.

For phase interpolator-based CDR, the reference clock can be a common PLL shared by the transceiver or independent PLLs for each channel. The jitter of the reference clock structure directly affects the jitter of the recovered clock and the bit error rate of reception.

l Phase Detector (PD)

The phase detector is used to compare phase errors, represented by UP or DN signals, where the duration of UP/DN is proportional to the phase error. An example of a bang-bang structured phase detector is shown in Figure 2.15. The example uses only four phases of the recovered clock.

l Extractor and Filter

The extractor allows the filter to operate at a lower frequency. The step size of extraction and smoothing methods will affect the performance of the loop. The digital filter consists of proportional and integral branches, which track phase errors and frequency errors, respectively. Additionally, the processing delay of the digital filter cannot be too large; if the processing delay is too large, the loop will not be able to track rapid changes in phase and frequency, leading to bit errors.

CDR structures are not limited to the above two types; there are many other variants. Generally, they are all phase-locked loops. The tracking performance, stability, bandwidth/gain performance analysis of CDR is a highly academic issue, and there are many books and materials explaining the quantization performance of loops. Some characteristics of CDR loops are summarized as follows:

l Loop Bandwidth

1. Phase jitter below the loop bandwidth will pass through CDR to the recovered clock. In other words, low-frequency jitter can be tracked by CDR without causing bit errors. High-frequency jitter components, depending on their amplitude, may cause bit errors.

2. The larger the loop bandwidth, the shorter the locking time, and the greater the jitter of the recovered clock. Conversely, the longer the locking time, the smaller the jitter of the recovered clock. As a CDR, we hope the loop bandwidth is larger to have greater jitter tolerance, but for loop timing applications such as SONET/SDH, there are limitations on the jitter of the recovered clock, so it cannot be too large.

3. The switching frequency of the switching power supply is generally less than the loop bandwidth and can be tracked by CDR. However, on the one hand, the noise coupled to the VCO (Digital to Multi-Phase Converter) from the switching power supply cannot be tracked by the loop, especially low-cost Ring VCOs are particularly sensitive to power supply noise. On the other hand, the harmonics of the switching power supply may exceed the loop bandwidth.

Some protocols provide CDR gain templates, such as SDH/SONET. Compatibility with these protocols requires calculating the jitter budget for inputs and outputs.

2.5 Common Phase-Locked Loop (PLL) SerDes requires an internal clock operating at the data baud rate, or an internal clock operating at 1/2 the data baud rate, working in DDR mode. The reference clock frequency provided to SerDes externally is far lower than the data baud rate, and PLL is used to multiply to generate high-frequency internal clocks. The PLLs for FPGA SerDes generally have 8x, 16x, 10x, 20x, 40x modes to support common SerDes interface protocols. For example, PCIExpress operates at 5Gbps, requiring an external reference clock of 125MHz in 40x mode and 250MHz in 20x mode.

A third-order PLL circuit is shown in Figure 2.17, where the phase of the input signal is compared with the phase of the VCO feedback signal by the phase detector, and the phase error is converted into a voltage or current signal by the charge pump. After smoothing through the loop filter, it produces a control voltage to correct the phase of the VCO, ultimately making the phase error approach zero.

Figure 2.17 A 3-order Type II PLL

PLL’s operation process is divided into the acquisition process and tracking process. In the acquisition process, the loop model can be represented by a nonlinear differential equation, which can evaluate capture time, capture bandwidth, and other indicators. After locking, within the small signal range,PLL’s model is a constant coefficient linear equation, and performance such as bandwidth, gain, and stability of the PLL can be studied in the Laplace transform domain.Figure2.18 is the small signal mathematical model.

Understanding FPGA SerDes Interfaces

PLL is named by the number of poles of the transfer function. The VCO has an integral effect on the phase (Kvco/s), so a loop without a filter is called a first-order loop. A loop with a first-order filter is called a second-order loop. First-order and second-order loops are unconditionally stable systems. However, higher-order loops have more poles and zeros that can independently adjust bandwidth, gain, stability, capture bandwidth, and capture time.

The frequency domain transfer function characteristics of PLL are mainly determined by the loop filter F(s)|s=jw. A typical PLL frequency domain transfer curve is shown in Figure 2.19. There are two important features, loop bandwidth and jitter peaking. Excessive peaking amplifies jitter, and a large damping factor can limit peaking, but it increases the locking time of the loop, affecting the roll-off speed and natural frequency.

l When the loop is locked, there is a fixed phase difference:

Kdc is the DC open-loop gain of the loop, and Δω is the difference between the VCO center frequency and the controlled frequency. For charge pump + passive filter structures of PLL, the phase error is zero.

l When the loop is locked, there is only a fixed phase difference, and the frequencies of the two input signals are equal.

fr/M = fo/N

l For input noise, the loop acts as a low-pass filter, suppressing noise or interference above the loop cutoff frequency. For PLL of SerDes, it is desirable to have a smaller bandwidth to suppress interference and noise on the reference clock.

For VCO noise, the loop acts as a high-pass filter. Only VCO noise below the loop cutoff frequency is suppressed. Excessive high-frequency noise from the VCO will worsen the clock jitter. Low-speed SerDes (<5Gbps) VCOs use ring structures for cost considerations, which generate significant noise and are sensitive to power supply. High-speed SerDes VCOs use smaller noise LC structures.

3. Jitter and Signal Integrity (SI)

Jitter refers to the phenomenon where the timing of signal transitions deviates from its ideal or expected timing. Noise, non-ideal channels, and non-ideal circuits are all causes of jitter.

3.1 Clock Jitter

Understanding FPGA SerDes Interfaces Figure 3.1 Clock Jitter

For clock signals, the definition of jitter varies depending on the application scenario. For example, when digital logic calculates timing margins, cycle jitter is of concern. Clock designers prefer phase jitter because they can use the spectrum to evaluate phase jitter and assess the contribution of specific interference to total phase jitter. Referring to Figure 3.1, let’s introduce several definitions of jitter.

l Phase jitter (phase jitter) Jphase(n)= tn – n*T. The ideal clock’s each period T is equal; there is no jitter. The deviation of the real clock’s edge from the ideal clock is called phase jitter.

l Period jitter (period jitter)

Jperiod(n)= (tn– tn-1)– T. Period jitter is the deviation of the actual clock’s period relative to the ideal period. Clearly, Jperiod(n) = Jphase(n) – Jphase(n-1).

l Cycle-to-Cycle jitter

Jcycle(n) = (tn– tn-1) – (tn-1– tn-2). The deviation between two adjacent cycles is Cycle-Cycle jitter. Clearly, Jcycle(n)= Jperiod(n) – Jperiod(n-1).

Assuming the maximum phase jitter is +/-Jp, and the jitter frequency fjitter = 0.5fclock = 0.5/T, thus,

At time tn-2, the phase jitter is at maximum +Jp, while at time tn-1, the phase jitter is at minimum -Jp.

At time tn, the phase jitter is at maximum +Jp, while at time tn+1, the phase jitter is at minimum -Jp.

Then, the maximum period jitter will be Jperiod=+/- 2* Jp

Then, the maximum Cycle-Cycle jitter will be Jcycle =+/- 4* Jp

3.2 Data Jitter

In the high-speed SerDes field, everyone talks about jitter because it is directly related to bit error rate (BER).

An important requirement for the SerDes transmitter is jitter generation – the jitter generated by the transmitter for specific patterns, rates, and load conditions. The signal arriving at the receiver through the channel will further amplify the jitter, and different patterns contain different frequency components, and the channel has different transmission delays for different frequency components (non-linear phase), causing deterministic jitter related to the data pattern. Reflections caused by impedance discontinuities, crosstalk from adjacent signals, and noise will all cause data jitter. An important index for the SerDes receiver is jitter tolerance – the amount of jitter that the SerDes receiver can tolerate for specific patterns and BER requirements (BER<10^-12). When evaluating jitter, methods such as eye diagrams, bathtub curves, jitter distribution histograms (PDF), jitter spectra, etc. are used.

It should be noted that when discussing data jitter in high-speed SerDes (Tj, Rj, Dj etc.), low-frequency jitter is not included. This is because low-frequency jitter is considered a wander that can be tracked by CDR without causing bit errors. When measuring data jitter with an oscilloscope (SDA), the bandwidth of the embedded CDR loop in the oscilloscope can be set, and the jitter data measured by the oscilloscope has filtered out low-frequency jitter.

According to the causes of jitter and probability density functions, jitter is often classified into several types. The significance of classifying jitter lies in the fact that some types of jitter can be corrected, while others cannot. The classic total jitter Tj (Total Jitter) is classified into deterministic jitter Dj (deterministic jitter) and random jitter Rj (random jitter). Jitter can be expressed in UI or ps, and can be either root mean square value or peak-to-peak value.

3.2.1 Dj

Dj is further divided into:

l DCD (Duty Cycle Distortion)

Duty cycle distortion jitter. Inconsistent bias voltages between the positive and negative ends of differential signals, or inconsistent rise and fall times lead to duty cycle distortion. Since DCD is related to data patterns, it is a correctable jitter.

l DDJ (Data Dependent Jitter)

Data pattern-related jitter, also known as inter-symbol interference (ISI). DDJ is caused by non-ideal channels and can be corrected by equalizers.

l Pj (Periodic Jitter)

Periodic jitter. Pj is caused by periodic interference sources in the circuit. For example, the switching frequency of the switching power supply, crosstalk from clock signals, etc. Although the switching frequency of the power supply is generally within the tracking range of CDR, the low-order harmonic components may fall outside the loop bandwidth or jitter peaking area. Moreover, the interference of power supply harmonics on the VCO within CDR cannot be suppressed or tracked, so for CDR based on Ring VCO, it is essential to use LDO power supply as much as possible. Pj cannot be corrected by equalizers.

l BUJ (Bounded Uncorrelated Jitter)

BUJ is caused by non-clock interference sources. If the aggressor and victim of the interference source are asynchronous, the probability distribution of jitter is a bounded Gaussian distribution, also known as CBGJ (Correlated Bounded Gaussian Jitter). BUJ/CBGJ cannot be corrected.

3.2.2 Rj

Rj is caused by the noise of the semiconductor itself, and an important characteristic is that the probability density function of Rj is Gaussian distributed, with no bounds, and is independent of data patterns. It can only be considered bounded under certain BER constraints.

3.2.3 Tj

Mathematically, the probability distribution function of jitter can be viewed as a convolution of Gaussian distribution and double-bottomed Lorentzian distribution.

Jitter contributing to the Gaussian distribution includes:

n Rj is Gaussian distributed

n The effect of a large number of Pj superpositions is also Gaussian distributed

n Some BUJ is also Gaussian distributed

Jitter contributing to the double-bottomed Lorentzian distribution includes:

n DCD is approximated as a double-bottomed Lorentzian probability distribution

The convolution of Gaussian and double-bottomed Lorentzian distributions:

Where W is considered the peak-to-peak value of deterministic jitter, δ is the standard deviation of the Gaussian distribution. As shown in Figure 3.2, as the deterministic jitter W increases, a double peak appears at the top of the probability density distribution curve. Generally, the top curve reflects the magnitude of deterministic jitter.

In the specified BER, this table can quickly estimate the relationship between the standard deviation and peak-to-peak values. For example, if the standard deviation of Gaussian jitter is 0.05UI, and the BER requirement is 10^-12, it can be found that Q=7, so the peak-to-peak value of Gaussian jitter is required to be 0.05UI*7*2 = 0.7UI.

As previously mentioned, W=0.05UI, Rj=0.05UI, the total jitter Tj=0.746UI calculated; the Gaussian jitter estimated based on Gaussian characteristics is 0.7UI. If Tj = Rj(0.7UI)+Dj(0.05UI) is calculated, it yields 0.75U, which is essentially consistent, with the difference being due to quantization errors in the plotting program.

4. Signal Integrity (SI) and Simulation

4.1 Channel

The frequency range of the SerDes channel of concern is from 0Hz to Nyquist frequency, which is twice the signal base frequency. The signal base frequency is half of the signal line rate, meaning the Nyquist frequency of the signal is the line rate. The channel’s damage to the signal includes insertion loss, reflection, crosstalk, etc. These damages can be expressed using S-parameter channel models. S-parameters can be measured using a vector network analyzer (Vector Network Analyzer). The channel is not a purely resistive network; it also includes capacitive and inductive elements. Thus, the delay of different frequency components is also different, causing data pattern-related jitter.

Every discontinuity of impedance on the channel will produce reflections. Depending on the position of the reflection, the reflected signal will add or subtract from the original signal, increasing or decreasing the signal amplitude.

SerDes signals are in differential form, which has strong suppression against common-mode interference. If there are differences in interference on the +/- ends, it will introduce crosstalk. Typically, external PCBs can ensure that SerDes data and interference sources maintain sufficient distance, but it is challenging to guarantee sufficient isolation distance between SerDes signals and interference sources inside the chip, especially when a channel’s own transmitting signal interferes with its receiving signal.

4.2 Chip Package

The package is also part of the channel. The external channel of the chip can be measured through VNA, while the S-parameters of the package are typically provided by the chip manufacturer, allowing for cascading during simulation. Due to the short distance, insertion loss is usually not a major issue for the package; the primary consideration is impedance matching.

4.3 SI Simulation

Signal integrity (SI) simulation can be constructed by cascading the SPICE model of the SerDes transmitter, the S-parameter model of the package and channel, and the SPICE model of the receiver to create a simulation platform. Using simulation tools, circuit simulations can be performed under different excitations and test conditions. By measuring the eye diagram at the SerDes receiver, it can be evaluated whether the design requirements are met. The actual eye diagram at the receiver can also be measured to check whether it meets the eye diagram template or the eye diagram template specified by the protocol. Figure 4.1 shows a measured eye diagram of a 3.125Gbps signal and its template, which also includes a bathtub curve and statistical graph.

Figure 4.1 Rx-end Eye-diagram of A 3.125Gbps SerDes

For high-speed SerDes (>5Gbps), this traditional circuit simulation method is no longer sufficient to meet design needs. First, excessive inter-symbol interference (ISI) can cause the eye diagram at the receiver to completely close, but after equalization inside the chip using DFE, the eye diagram may be very good. Second, circuit simulation (SPICE) is very slow; even if there are ways to incorporate DFE equalization into the simulation, DFE simulation requires a sufficiently long bit stream for training, making the simulation time unacceptable.

Simulation of high-speed SerDes requires the use of statistical analysis methods. Statistical analysis treats the connection of the transmitter-channel-receiver as a linear system, calculating the system impulse response h(t) and adding noise sources to simulate jitter. Then, the excitation is convolved with the impulse response to obtain the signal at the receiver; this method can incorporate vendor proprietary FFE and DFE adaptive algorithms into the simulation.

Statistical analysis methods cannot simulate the non-linear and time-varying characteristics of circuits, so high-speed SerDes often requires a combination of both methods for SI simulation. More information on statistical analysis methods can be referenced.

5. Conclusion

It has been said that today’s cars are so complex that although each part is understood by someone, no one can fully understand the entire vehicle when combined. In recent years, FPGAs have become increasingly complex, raising the requirements for engineers. To become a qualified FPGA application engineer, one must not only be adept in digital circuit design but also understand high-speed SerDes, signal integrity (SI), DSP algorithms, multi-core CPUs, embedded operating systems, and more. Each technology behind it is a specialized field, and one person cannot be an expert in every field. Just learning a little more than others can highlight your value at critical times. This article mainly introduces the basic structure of SerDes and the knowledge needed to effectively utilize SerDes, hoping to assist you in your work.

Welcome FPGA engineers and embedded engineers to follow the public account

The largest FPGA WeChat technical group in the country

Welcome everyone to join the national FPGA WeChat technical group, which has thousands of engineers, a group of engineers who love technology, where FPGA engineers help and share with each other, creating a strong technical atmosphere!Quickly call your friends to join!!

Understanding FPGA SerDes Interfaces

Press and hold to join the national FPGA technical group

FPGA Home Component City

Advantageous component services, please contact the group owner: Jin Juan, Email: [email protected] Welcome to recommend to procurement

ACTEL, AD some advantageous orders (operating the full series):

Understanding FPGA SerDes Interfaces

XILINX, ALTERA advantageous spot or order (operating the full series):

Understanding FPGA SerDes Interfaces

(The above components are part of the model; for more models, please consult the group owner Jin Juan)

Service philosophy: FPGA Home Component City aims to facilitate engineers in quickly and conveniently purchasing component services. After years of dedicated service, our customer service is spread across large listed companies, military research units, and small and medium enterprises. Our greatest advantage is to emphasize the service-first philosophy and achieve fast delivery and competitive prices!

Direct brands: Xilinx, ALTERA, ADI, TI, NXP, ST, E2V, Micron, and more than a hundred component brands, especially skilled in components subject to US export restrictions to China.We welcome engineer friends to recommend us to procurement or consult us personally!We will continue to provide the best service in the industry!

Understanding FPGA SerDes Interfaces

FPGA technology group official thanks to brands: Xilinx, Intel (Altera), Microsemi (Actel), Lattice, Vantis, Quicklogic, Lucent, etc.

1. The Value of SerDes

1.1 Parallel Bus Interfaces

1.3 Intermediate Types

2. SerDes Architecture

2.1 Serializer/Deserializer

2.2 Tx Equalizer

2.3 Rx Equalizer

2.3.1 Linear Equalizer

2.3.2 DFE Equalizer (Decision Feedback Equalizer)

2.4 Clock Data Recovery (CDR)

3. Jitter and Signal Integrity (SI)

3.2 Data Jitter

3.2.1 Dj

3.2.2 Rj

3.2.3 Tj

4. Signal Integrity (SI) and Simulation

4.1 Channel

4.2 Chip Package

4.3 SI Simulation

5. Conclusion

Related posts

Leave a Comment Cancel reply