## 22.3 A 4-to-11GHz Injection-Locked Quarter-Rate Clocking for an Adaptive 153fJ/b Optical Receiver in 28nm FDSOI CMOS

Mayank Raj, Saman Saeedi, Azita Emami

### California Institute of Technology, Pasadena, CA

Modern SoC systems impose stringent requirements on on-chip clock generation and distribution. Ring-oscillator (RO) based injection-locked (IL) clocking has been used in the past [1] to provide a low-power, low-area and low-jitter solution. Ring-based injection-locked oscillators (ILO) can also be used to generate quadrature phases from a reference clock [2] without frequency division, which is desirable for half-rate and quarter-rate CDR. However, ILO inherently has a small locking range [3] making it less suitable for wideband applications. In addition, drift in the free-running frequency due to PVT variations may lead to poor jitter performance and locking failures [4]. Adding a PLL to an ILO provides frequency tracking. However, PLL-aided techniques have second-order characteristics that lead to jitter peaking. They also add design complexity and power consumption [5]. We present a frequency-tracking method that exploits the dynamics of IL in a quadrature RO to increase the effective locking range. This quadrature locked loop (QLL) is used to generate accurate clock phases for a 4-channel optical receiver using a forwarded clock at quarter-rate (Fig. 22.3.1). The QLL drives an ILO at each channel without any repeaters for local quadrature clock generation. Each local ILO has deskew capability for phase alignment. The receiver maintains per-bit energy consumption across wide data-rates (16 to 32Gb/s) by adaptive body biasing (BB) in a 28nm FDSOI technology.

When an RO with natural frequency  $f_0$  (Fig. 22.3.2) is injected with a reference signal fini, IL causes adjacent time-interleaved phases of the oscillator to be unevenly spaced [3]. The phase error between I and Q in the locked state is given by  $\pi/2(f_{ini}/f_0 - 1)$  (Fig. 22.3.2). In the unlocked state the quadrature phase error beats with a frequency  $f_b$  (Fig. 22.3.2). Thus the mean quadrature phase error (MQPE) can be calculated by integrating the transient phase error over one cycle, which is  $\pi/2(f_{ini}/f_0 + f_b/f_0 - 1)$ . Figure 22.3.2 shows the plot of MQPE versus  $f_0$ , where MQPE goes to zero asymptotically as  $|f_{inj}$  -  $f_{o}|$  increases in the unlocked state. This suggests that MQPE is a measure of the sign of f<sub>ini</sub> - f<sub>o</sub> in both locked and unlocked states. MQPE itself can be controlled by changing the injection strength (Fig. 22.3.2). Thus a quadrature-phase-error detector can be used as a phase-frequency detector (PFD) in injection-locking environment. Instantaneous quadrature error is measured by using a phase detector (PD) that takes I and Q phases of the clock from an ILO as inputs. The error is averaged using a charge pump and a loop filter, and fed back to the oscillator  $V_{ctrl}$  (Fig. 22.3.1). The loop tracks the changes in the injected frequency and natural frequency of the oscillator until their difference  $|f_{inj}$  -  $f_0|$  is minimized, assuring a wide locking range. This technique obviates the need for a PFD and its speed limitations. Wide jitter-tracking bandwidth inherent to IL helps in preserving the correlated lowfrequency jitter and suppressing the uncorrelated high-frequency jitter. The system's first-order behavior assures stability without jitter peaking (Fig. 22.3.3). In addition, as the reference clock is not used by the PD, it does not need to be rail to rail. Each ILO consists of a V-to-I converter and a two-stage cross-coupled pseudo-differential current-starved RO. A two-stage RO is chosen to minimize power consumption. The reference clock can be injected both electrically and optically. A TIA-based optical front-end is used in the latter case. The TIA output-voltage amplitude (150mV) is sufficient for the IL architecture because of its high voltage gain [1].

To demonstrate the increase in locking range, we disable the loop and set the  $V_{ctrl}$  (Fig. 22.3.1) of the ILO at  $V_{DD}/2$ . Without the quadrature-phase-error tracking, a locking range of 7 to 7.4GHz is observed at an injection strength (K) of 0.05. With the loop activated the locking range improves to 4 to 11GHz. The quadrature correction loop must run slower than the injection-locked loop to assure stability. Under such a condition, the effective bandwidth of the system (when locked) is dictated by the injection-locking process. To demonstrate this property, the jitter transfer function of the system is measured (Fig. 22.3.3). It has a low-pass characteristic with a bandwidth of 150MHz and a -20dB/dec decay, suggestive of a first-order system. Ring oscillators are susceptible to power-supply variations. Injection locking helps in suppressing low-frequency  $V_{DD}$  noise as shown in Fig. 22.3.3. The measurement is made by adding sinusoidal noise on  $V_{DD}$  and then measuring the relative frequency sidebands on the output in unlocked and locked cases. Integrated output jitter (100kHz to

1GHz) of 558fs and 577fs are measured at 8GHz (32Gb/s operation) for electrical and optical inputs, respectively. At the highest locking frequency (11GHz) the integrated output jitter is 642fs. Quadrature-error measurements are done by directly measuring deviation of I and Q phases (Fig. 22.3.5), and an average of 1.5° is recorded across the locking range. The QLL consumes 2.77mW at 11GHz.

Figure 22.3.4 shows the top-level architecture of the adaptive receiver with dynamic BB using V<sub>ctrl</sub> of the QLL. The first stage of the receiver is an ultralow-power TIA with  $3k\Omega$  feedback resistor and a bandwidth much lower than the data-rate [6]. The low-bandwidth (LBW) TIA output is sampled at the end of two consecutive bits  $(V_n, V_{n+1})$  and these samples are compared to resolve each bit. Similar to [6], dynamic offset modulation provides a constant voltage at the sense-amplifier input regardless of the bit sequence. A de-multiplexing factor of 4 is achieved immediately after the TIA using quarter-rate clocked samplers. Because the building blocks use bias currents, the energy per bit of the optical receiver degrades at lower data-rates. In an FDSOI CMOS process, the BB effect is significantly enhanced compared to bulk CMOS, and  $V_{t}$  of the transistors can be tuned between 80 to 150mV per 1V modulation of  $V_{BB}$ , depending on device type. Figure 22.3.4 shows the adaptive  $V_{BB}$  generator that controls the V<sub>t</sub> of transistors in the LBW TIA and current mirror, thus the tail current of the amplifier stage in the front-end. The transfer function from  $V_{ctrl}$  of the QLL to  $V_{BB}$ generator outputs is such that the receiver building blocks optimally work at any given data-rate. Hence, the gain-bandwidth product of the TIA and amplifier's gain are adaptively set to be proportional to the data-rate. This is done by fitting the transfer function of the BB generator from V<sub>ctrl</sub> of QLL to BB of respective blocks.

The receiver is bonded to a photodiode with responsivity of 0.9A/W, and tested at different data-rates with and without adaptive power reduction (Figs. 22.3.4 and 22.3.5). The optical beam from a 1550nm DFB laser is modulated by a Mach-Zehnder modulator and coupled to the photodiode using a single-mode fiber. The total capacitance at the input node is estimated to be 120fF. Combined optical loss due to the optical coupling and optical connector is measured to be 2.8dB. Receiver sensitivity, measured with PRBS 15 data, is -12.1dBm at 16Gb/s and -8.8dBm at 32Gb/s. The maximum achievable data-rate is limited by the maximum data-rate of the external PRBS generator. Receiver power breakdown and energy per bit are shown in Fig. 22.3.4. Measurements are performed with adaptive  $V_{BB}$  generator on and off (Fig. 22.3.4). When the adaptive  $V_{BB}$  generator is active, the per-bit energy consumption improves from 103fJ/b at 32Gb/s to 94fJ/b at 16Gb/s. The prototype is fabricated in 28nm FDSOI CMOS (Fig. 22.3.7) with area of 0.3×0.06mm<sup>2</sup>. Measured power of each channel, including all clocking circuits, is 4.87mW at 32Gb/s (153fJ/b). Figure 22.3.6 summarizes the system performance and compares it to prior art.

### Acknowledgements:

The authors thank M. Monge for technical discussions and ST Microelectronics for chip fabrication.

### References:

[1] L. Zhang, *et al.*, "Injection-Locked Clocking: A Low-Power Clock Distribution Scheme for High-Performance Microprocessors," *Dig. Symp. VLSI Circuits*, pp. 1251-1256, Sept. 2008.

[2] P. Kinget, *et al.*, "An injection-locking scheme for precision quadrature generation," *IEEE J. Solid-State Circuits*, vol. 37, no. 7, pp. 845-851, July 2002.
[3] K. Hu, *et al.*, "A 0.6mW/Gb/s, 6.4-7.2 Gb/s serial link receiver using local injection-locked ring oscillators in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 899-908, 2010.

[4] W. Deng, *et al.*, "A 0.022mm<sup>2</sup> 970µW dual-loop injection-locked PLL with -243dB FOM using synthesizable all-digital PVT calibration circuits," *ISSCC Dig. Tech. Papers*, pp. 248-249, Feb. 2013.

[5] J. Chien, *et al.*, "A pulse-position-modulation phase-noise-reduction technique for a 2-to-16GHz injection-locked ring oscillator in 20nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 52-53, Feb. 2014.

[6] S. Saeedi and A. Emami, "A 25Gb/s 170 $\mu$ W/Gb/s optical receiver in 28nm CMOS for chip-to-chip optical communication," *IEEE RFIC*, pp. 283-286, 2014. [7] T. Takemoto, *et al.*, "A 4× 25-to-28Gb/s 4.9mW/Gb/s -9.7dBm High-Sensitivity Optical Receiver Based on 65nm CMOS for Board-to-Board Interconnects," *ISSCC Dig. Tech. Papers*, pp. 118-119, Feb. 2013.

[8] T. Huang, *et al.*, "A 28Gb/s 1pJ/b shared-inductor optical receiver with 56% chip-area reduction in 28nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 144-145, Feb. 2014.











Figure 22.3.5: Measured BER and eye diagram with PRBS 15 optical data (top). Screenshots of the measured quadrature waveforms (bottom).



Figure 22.3.2: Simulated transient quadrature phase error of a two-stage ILO in the unlocked case (top) and MQPE vs.  $f_0$  (bottom).





|                                                                                                                                 | This work                                                 | [2]                 |                       | [3]                                 | [4]                                 | [5]                                      |
|---------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|---------------------|-----------------------|-------------------------------------|-------------------------------------|------------------------------------------|
| Architecture                                                                                                                    | QLL                                                       | ILO                 |                       | ILO                                 | IL-PLL                              | PPM IL                                   |
| Oscillator                                                                                                                      | CMOS Ring                                                 | CMOS Ring           |                       | CMOS Ring                           | CMOS Ring                           | CMOS Ring                                |
| Technology                                                                                                                      | 28nm FD SOI                                               | 250nm BiCM          | os                    | 90nm CMOS                           | 65nm CMOS                           | 20nm CMOS                                |
| Locking range                                                                                                                   | 4GHz - 11GHz                                              | 340MHz              | +                     | 203MHz                              | _                                   | _                                        |
| Output Integrated Jitter<br>(σ)                                                                                                 | 558fs -577fs*<br>(8GHz)<br>642fs (11GHz)<br>(100kHz-1GHz) |                     |                       | <1.5ps (RMS<br>Jitter at<br>2.5GHz) | 0.7ps at<br>1.2GHz(10kHz-<br>40MHz) | 434fs/268fs at<br>15GHz<br>(100kHz-1GHz) |
| I/Q error                                                                                                                       | 1.5°                                                      | 0.7**               | -                     | 4.5°                                | NA                                  | NA                                       |
| Active Area                                                                                                                     | 0.003mm <sup>2</sup>                                      | 0.09mm <sup>2</sup> | +                     | 0.026mm <sup>2</sup>                | 0.022mm <sup>2</sup>                | 0.044mm <sup>2</sup>                     |
| Supply                                                                                                                          | 1V                                                        | 3V                  | -                     | 1.2V                                | -                                   | 1.25/1.1V                                |
| Power Diss. (P) at (F)                                                                                                          | 2.77mW at 11GHz                                           | 15mW at 2.70        | Hz                    | 1.3mW at 2GHz                       | 0.97mW at 1.2GHz                    | 46.2mW at 15GH                           |
| Figure of Merit (FOM) =<br>$10 \log \left[ \left( \frac{\sigma}{1s} \right)^2 \cdot \frac{P}{1mW} \cdot \frac{1GHz}{F} \right]$ | -250dB                                                    | -                   |                       | -238dB                              | -244dB                              | -247dB                                   |
| Optical clock input                                                                                                             | **Not measured d                                          | lirectly            |                       |                                     |                                     |                                          |
|                                                                                                                                 | This wor                                                  | k l                 |                       | [6]                                 | [7]                                 | [8]                                      |
| Technology                                                                                                                      | 28nm FD SOI                                               |                     | 28nm CMOS             |                                     | 65nm CMOS                           | 28nm CMOS                                |
| Data-Rate                                                                                                                       | 32Gb/s                                                    |                     | 25Gb/s                |                                     | 28Gb/s                              | 28Gb/s                                   |
| Efficiency                                                                                                                      | 103fJ/bit data and 50fJ/bit clock                         |                     | 170fJ/bit             |                                     | 3.25pJ/bit                          | 1.03pJ/bit                               |
| Active area                                                                                                                     | 0.018mm <sup>2</sup> (4 channel)                          |                     | 0.0018mm <sup>2</sup> |                                     | 3.25mm <sup>2</sup>                 | 0.318mm <sup>2</sup>                     |
| Soncitivity (Ontical)                                                                                                           | -8.8dBm at 32Gb/s                                         |                     | -6.8dBm at 25Gb/s     |                                     | -9 7dBm at 25Gb/s                   | -6dBm at 10Gb/                           |

Figure 22.3.6: Performance summary and comparison with the prior art for QLL (top) and receiver (bottom).

# **ISSCC 2015 PAPER CONTINUATIONS**

| Figure 22.3.7: Chip micrograph. Clock distribution is highlighted in the layout capture (white). |  |
|--------------------------------------------------------------------------------------------------|--|
|                                                                                                  |  |
|                                                                                                  |  |