# A Low-Power Receiver with Switched-Capacitor Summation DFE Azita Emami-Neyestanak<sup>1</sup>, Aida Varzaghani<sup>2</sup>, John Bulzacchelli<sup>1</sup>, Alexander Rylyakov<sup>1</sup>, Chih-Kong Ken Yang<sup>2</sup> and Daniel Friedman<sup>1</sup> <sup>1</sup>IBM T.J. Watson Research Center, Yorktown Heights, NY, USA and <sup>2</sup>University of California, Los Angeles, CA, USA ### **Abstract** A low power receiver with a one tap DFE was fabricated in 90nm CMOS technology. The speculative equalization is performed using switched-capacitor-based addition directly at the front-end sample-hold circuit. In order to further reduce the power consumption, an analog multiplexer is used in the speculation technique implementation. A quarter-rate-clocking scheme facilitates the use of low-power front-end circuitry and CMOS clock buffers. At 10Gb/s data rate, the receiver consumes less than 6.0mW from a 1.0V supply. Keywords: high speed IO, interconnect, receiver, DFE ### Introduction The increasing demand for high bandwidth interconnection between integrated circuits requires large numbers of inputs and outputs (IOs) per chip as well as high data rates per IO. Key limitations in meeting these requirements in today's systems include channel characteristics and IO power consumption. Even in short interconnects, the channel attenuation at very high data rates is significant, and using equalization techniques can greatly improve the link performance. However, techniques such as decision feedback equalization (DFE) can increase the power consumption significantly. In this work we explore the possibility of building simple but very low-power DFE receivers suitable for short to medium length interconnects. ### **Receiver Design** ### A. Receiver Architecture The top-level receiver block diagram is shown in Fig. 1. The input, output and clock signals are differential. This receiver uses a 1:4 de-multiplexing scheme, where four equally-spaced phases of quarter-rate clock are used to sample the data. Therefore the clock buffers and the four parallel front-end slicers operate at a frequency of only one quarter that of the data rate. In order to further relieve the speed requirement of the slicers and the DFE adder, the first post-cursor of the inter-symbol interference (ISI) is compensated speculatively for both possible values of the previous bit, one and zero. As shown in Fig. 1, a pair of sampler/adders (S/A) is used for each branch of the front-end to sample the input signal V, and then to adjust the sampled value positively and negatively. $\alpha V_{_{ref}}$ and - $\alpha V_{_{ref}}$ represent the DFE tap values for one and zero bits, respectively; $\alpha$ can be adjusted for a specific channel, signal amplitude, and data rate. As soon as the previous bit is resolved, an analog multiplexer (MUX) chooses the correct value of the adjusted data sample. With the analog MUX preceding the slicer, only one latch is required per branch, thus saving power. In the next two sections we discuss each stage of the design in more detail. Fig. 1 Receiver block diagram ### B. Adder The first ISI post-cursor can be equalized by subtracting the estimated error from the main sample. This operation is commonly done by using an analog summer or by introducing an offset to the slicer [1]. Most summer circuits shown to date are based on a current mode scheme [2,3]; current levels are however high to meet the DFE response time requirements. In this work we explore the use of a voltage/charge mode summation technique shown in Fig. 2 (for simplicity, only the half-circuit of the differential structure is shown.) The standard "sample and hold" front end is extended to a switched-capacitor adder, using the clocking shown in the figure. Fig. 2 Receiver Sampler/Adder switched-capacitors and clocking The switches S1, S1d and S1B are turned ON with clock phases Ck1, Ck1d and Ck1B respectively. During the sampling phase, both S1 and S1d are ON, so the voltage across Cs is $V_i$ - $V_{CM}$ and $V_{out} = V_{CM}$ . S1 is turned OFF slightly earlier than S1d to make the charge injection and sampling time signal-independent [4]. During the hold/equalize phase, switch S1B is turned ON, and $V_{\mbox{\tiny out}}$ becomes $V_{\mbox{\tiny CM}}$ - $V_{\mbox{\tiny i}}+\alpha V_{\mbox{\tiny nef}}$ . To maximize linearity, $C_{\mbox{\tiny s}}$ is a 20fF lateral capacitor built from four metal levels. ### C. Slicer and Multiplexer Fig. 3 illustrates the timing of one of the front-end branches, here triggered by Ck2. The input signal sampling is done with the falling edge of Ck2, when the speculative equalization starts. The next stage MUX is activated when Ck2 is low, and the final latch is triggered with the next rising edge of Ck2. To save power and reduce delay, the analog MUX is embedded within a CML latch [5], as shown in Fig 3. The Sel signal for MUX is the resolved previous bit, which is the output of the adjacent branch, triggered by Ck1. The delay from the rising edge of Ck1 to Sel signal is shown as "Regeneration" time. The sum of regeneration delay and MUX delay must be less than a bit-time. The equalization also must be completed in one bit-time plus "Regeneration" time. These timing requirements and the sampler bandwidth are the key issues that set the maximum receiver data rate. Fig. 3 Timing of the front-end receiver and MUX/Latch schematic ## **Experimental Results** The receiver was fabricated in IBM 90nm CMOS technology. The DFE performance was examined over channels with different amounts of ISI. Using very short cables (low ISI), the receiver operates error-free at more than 10Gb/s. The second test (moderate ISI) used a 5" minimum via stub, ~4.0GHz bandwidth PCB trace. In this case, the receiver achieves a BER < 10<sup>-12</sup> at 9.0Gb/s with PRBS31 data patterns and at 11Gb/s with PRBS7 patterns. These results are possible only when the DFE coefficient is set to the optimum value. The third test (high ISI) used a 16" Tyco channel with high levels of reflections and attenuation. Here a BER < 10<sup>-12</sup> is obtained at 5.0Gb/s and 6.0Gb/s (Fig. 4) with PRBS31 and PRBS7 data patterns, respectively. Fig. 4 6.0Gb/s data over 16" Tyco, a) input signal, b) recovered data and quarter rate clock (44mV per division) At 1.0V supply voltage, the receiver and clock buffers consume 6.0mW of power at 10Gb/s data rate and 5.0mW at 6.0Gb/s data rate. TABLE I summarizes the performance. TABLE I RECEIVER PERFORMANCE SUMMARY | Technology | 90nm CMOS | |--------------------------------------|--------------------------| | Supply voltage | 1.0 V | | Data rate: cable, 5" PCB, 16" Tyco | 10Gb/s, 9.0Gb/s, 6.0Gb/s | | Power consumption | 6.0mW @ 10Gb/s | | Sensitivity (no offset compensation) | ± 40 mV differential | #### Conclusion A one-tap DFE receiver with speculation is designed and fabricated in 90nm CMOS technology. This receiver is suitable for channels with low levels of ISI, mostly due to attenuation not reflections. The simple, low-power DFE can significantly enhance the data rate over short/medium-length channels. In this design high power efficiency (0.6mW/Gbps) is achieved by using switched-capacitor adders, analog multiplexers, and quarter-rate clocking, which allows use of CMOS clock buffers and relieves timing of critical circuits. ### Acknowledgment This work was supported by MPO contract H98230-04-C-0920. ### References - [1] V. Stojanovic *et al.*, "Adaptive Equalization and Data Recovery in a Dual-Mode (PAM2/4) Serial Link Transceiver," *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, pp. 348-351, June 2004 - [2] T. Beukema *et al.*, "A 6.4-Gb/s CMOS SerDes Core with Feed-Forward and Decision-Feedback Equalization," *IEEE J. Solid-State Circuits*, vol. 40, pp. 2633-45, Dec. 2005 - [3] R. Payne *et al.*, "A 6.25-Gb/s binary transceiver in 0.13-µm CMOS for serial data transmission across high loss legacy backplane channels," *IEEE J. Solid-State Circuits*, pp. 2646-57, Dec. 2005 - [4] G. M. Haller and B. A. Wooley, "A 700-MHz Switched-Capacitor Analog Waveform Sampling Circuit," *IEEE J. Solid-State Circuits*, vol. 29, pp. 500-508, Apr. 1994 - [5] S.-J. Bae, H.-J. Chi, Y.-S. Sohn, and H.-J. Park, "A 2Gb/s 2-tap DFE Receiver for Multi-Drop Single-Ended Signaling Systems with Reduced Noise", *ISSCC Dig. Tech. Papers*, pp. 244-245, Feb 2004