# A Wideband Injection Locked Quadrature Clock Generation and Distribution Technique for an Energy-Proportional 16–32 Gb/s Optical Receiver in 28 nm FDSOI CMOS

Mayank Raj, Student Member, IEEE, Saman Saeedi, and Azita Emami, Member, IEEE

Abstract—We present a novel frequency tracking method that exploits the dynamics of injection locking in a quadrature ring oscillator to increase the effective locking range from 5% (7–7.4 GHz) to 90% (4–11 GHz). The quadrature phase error between I and Q phases of an injection locked ring oscillator is derived and shown to contain frequency error information, both inside and outside the locking range. This error is utilized to form a first-order frequency tracking quadrature locked loop (QLL). This loop generates accurate clock phases for a 4-channel parallel optical receiver using a forwarded clock at quarter-rate. The QLL drives an ILO at each channel without any repeaters for local quadrature clock generation. Each local ILO has deskew capability for phase alignment. The receiver maintains a constant energy-per-bit consumption across 16–32 Gb/s by adaptive body biasing in a 28 nm FDSOI technology.

Index Terms—Energy proportional, injection-locked, locking range, optical, quadrature, receiver, voltage controlled oscillator.



Fig. 1. Histogram of change in f<sub>0</sub> in a ring oscillator with process variation.

### I. Introduction

THE rise in the aggregate bandwidth of microprocessors has led to an insatiable demand for massively parallel low-power links with high data-rates. This has imposed stringent requirements on on-chip clock generation and distribution. Ring oscillator (RO) based injection-locked clocking has been used in the past [1] to provide a low-power, low-area and low-jitter solution. ROs are easily integrated in standard CMOS process and have smaller on-chip area compared to LC tank based oscillators making them suitable for dense parallel links. Ring based injection-locked oscillators (ILO) can also be used to generate quadrature phases from a reference clock [2] without frequency division, which is desirable for half-rate and quarter-rate CDR architectures. However, ILO inherently has a small locking range [3] making it less suitable for wideband applications; for example the transceivers embedded in field-programmable gate arrays (FPGAs) [4]. In addition, drift in free running frequency (f<sub>0</sub>) due to process, voltage and temperature (PVT) variations may lead to poor jitter

Manuscript received February 16, 2016; revised April 21, 2016 and June 1, 2016; accepted June 20, 2016. Date of publication July 27, 2016; date of current version September 30, 2016. This paper was approved by Associate Editor Pietro Andreani.

The authors are with the California Institute of Technology, Pasadena, CA 91125 USA (e-mail: makk@caltech.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2016.2584643

performance and locking failures [5]. Fig. 1 shows a simulated histogram, accounting for the change in a five-stage ring oscillator's fo with process variation, in 28 nm technology. The NMOS and PMOS devices in the inverters are sized as 1  $\mu$ m/28 nm and 2  $\mu$ m/28 nm respectively. The 3 $\sigma$  variation of 0.95GHz is observed around an oscillation frequency of 10GHz. For robust performance the locking range should be several times bigger than the variation in natural frequency but maximum locking range in ring based ILOs without any frequency tracking techniques, is only about 10% [6]. Adding a PLL to an ILO provides frequency tracking. However, PLL aided techniques have second-order characteristics that can lead to jitter peaking [7]. A simple frequency-lockedloop (FLL) is not sufficient to compensate for the drift as the output of an injection-locked oscillator is always fixed at the desired frequency, and FLL only comes to action after system loses lock [5]. This is also true for envelope detection based frequency tracking techniques, which activate after the ILO loses lock [8], [9]. Replica delay cell based frequency tracking technique can provide continuous frequency calibration [10]. However, they are prone to mismatch between the delay cells in the ring oscillator and the replica.

Generating quadrature phases at low area and power overhead from a reference clock is desirable for quarter-rate forwarded clock architectures. Both ring and LC based dividers have been frequently used for quadrature phase generation. However, because they operate at twice the desired frequency they tend to be power inefficient. Quadrature phase generation through ring ILO's without frequency division leads to phase inaccuracies [6]. Previous works have tried to tackle this problem with multiphase injection with RC-CR filters. This results in significant additional power consumption in the buffers driving the passive filter. Also, poly-phase filters limit the locking range and only work with pure sinusoidal signals [2].

We present a novel frequency tracking method that exploits the dynamics of the injection locking process in a quadrature ring oscillator to increase the effective locking range. We also show that the resultant system still has first-order characteristics, unlike an injection locked phase locked loop (IL PLL). This quadrature locked loop (QLL) is used to generate accurate clock phases for a 4-channel optical receiver using a forwarded clock at quarter-rate. The QLL drives an ILO at each channel without any repeaters for local quadrature clock generation. Each local ILO has deskew capability for phase alignment. The receiver maintains constant per-bit energy consumption across wide data-rates (16 to 32 Gb/s) by adaptive body biasing (BB) in a 28 nm FDSOI technology. Energy proportional optical receiver achieves significant power savings by reducing power consumption at lower data-rates or when idle. The prototype measurements indicate a record low-power consumption of 153 fJ/b at 32 Gb/s.

This paper is organized as follows. Section II describes the QLL system architecture. Mathematical analysis and system stability is discussed in Section III. Section IV details an energy proportional four channel quarter-rate optical receiver with QLL based clocking. Hardware measurement results are presented in Section V. Finally, Section VI summarizes the work and presents the conclusions.

#### II. QLL ARCHITECTURE

When a ring oscillator with natural frequency fo is injected with an external signal with frequency fini, the outputs of the ring oscillator incur a phase mismatch error if fo is not equal to f<sub>ini</sub> [6]. We prove that the mean of this error, i.e., mean quadrature phase error (MQPE), contains information about the difference between the natural frequency of the oscillator and injected frequency (i.e.  $|f_{inj}-f_o|$ ) in both locked and unlocked states. A phase detector and a low-pass filter is used to measure the MQPE. Their output is used in a negative feedback configuration to set the natural frequency of the ring oscillator there by nullifying the  $|f_{ini}-f_0|$  and quadrature phase error. This loop provides frequency tracking, thereby assuring wideband injection. We call this technique a quadrature locked loop, or QLL in short. [11] uses a similar frequency tracking technique in an LC ILO based divider. However, it makes no assertions about MQPE of the ILO, in the unlocked state.

In this section we derive an expression for the MQPE. To do so we first quantify the phase error caused due to injection. Fig. 2 shows a two-stage differential ring oscillator with a natural frequency of  $f_0$ ; thus both delay stages have an inherent delay of  $1/4f_0$ . One of the delay stages (A) is injected with a signal at  $f_{\rm inj}$ . Injection causes the delay of stage A to change to  $1/4f_0 + \Delta$  and the oscillator oscillates at a frequency f



Fig. 2. Deriving the quadrature phase error expression in a two-stage ring oscillator.

(not necessarily a constant) instead of f<sub>o</sub>. The delay of the other stage (B) stays the same.

$$Delay_{IQ}(t) = \frac{1}{4f_o} \tag{1}$$

But as the frequency of oscillation is f, phase delay can be expressed as

$$Delay_{IQ}(phase) = \frac{1}{4f_o} \times 2\pi f = \frac{\pi}{2} \times \frac{f}{f_o}$$
 (2)

Now from (2) we can calculate the instantaneous quadrature error  $(\emptyset_{qe}(t))$  as

$$\emptyset_{qe}(t) = Delay_{IQ}(phase) - \frac{\pi}{2} = \frac{\pi}{2} \left( \frac{f}{f_o} - 1 \right)$$

$$= \frac{\pi}{2} \left( \frac{\omega}{\omega_o} - 1 \right)$$
(3)

With this result (3) we can move ahead to calculating the MQPE. We do so by separately analyzing the locked and unlocked cases. In the locked state  $f(t) = f_{inj}$  (a constant), hence

$$MQPE = \frac{\pi}{2} \left( \frac{f_{inj}}{f_o} - 1 \right) = \frac{\pi}{2} \left( \frac{\omega_{inj}}{\omega_o} - 1 \right) \tag{4}$$

Another work [12], derives an MQPE expression similar to (4) for a four-stage ring ILO, in the locked state. To calculate the variation of quadrature phase error in the unlocked state, we need to calculate the variation of instantaneous frequency of the ILO in the unlocked state. Similar to [7], we write the instantaneous frequency ( $\omega$ ) of the ILO as  $\omega_{inj} + d\theta/dt$ . Here  $\theta$  is the phase difference injected signal and the output of the ILO. An expression for  $\omega$  can be obtained by differentiating the solution to the Adler's equation [13] for an ILO with a locking range  $\omega_{L}$  [7].

$$\omega = \omega_{inj} + \frac{\omega_b^2}{\omega_o - \omega_{inj}} \times \frac{sec^2\left(\frac{\omega_b t}{2}\right)}{1 + \left(\frac{\omega_L}{\omega_o - \omega_{inj}} + \frac{\omega_b}{\omega_o - \omega_{inj}} \tan\left(\frac{\omega_b t}{2}\right)\right)^2}$$
(5)

$$\omega_b = \sqrt{(\omega_o - \omega_{inj})^2 - \omega_L^2} \tag{6}$$

Equations (5) and (6) show that in the unlocked state the instantaneous frequency ( $\omega$ ) beats with a frequency  $\omega_b$ . Thus, as suggested by (3), the quadrature phase error also varies beats with frequency  $\omega_b$  (Fig. 3). This periodicity allows us to



Fig. 3. Quadrature error in unlocked case: (a) close to lock, (b) far from lock.



Fig. 4. (a) MQPE vs. fo for a fixed finj of 7 GHz; (b) effect of injection strength on MQPE.

calculate the MQPE in the unlocked state by integrating (3) from 0 to  $2\pi/\omega_{\rm b}$ .

$$MQPE = \frac{1}{\frac{2\pi}{\omega_b}} \times \int_0^{\frac{2\pi}{\omega_b}} \frac{\pi}{2} \times \left(\frac{\omega}{\omega_o} - 1\right) dt \tag{7}$$

Substituting  $\omega$  as  $\omega_{\rm inj}$  + d $\theta$ /dt [7] in (7) and integrating we get

$$MQPE = \frac{\pi}{2} \left[ \frac{\omega_{inj}}{\omega_o} - 1 + \frac{\omega_b}{2\pi \omega_o} \left\{ \theta \left( \frac{2\pi}{\omega_b} \right) - \theta (0) \right\} \right]$$
(8)

 $\theta$  varies by  $2\pi$  over one period [7] thus we have

$$MQPE = \frac{\pi}{2} \left[ \frac{\omega_{inj}}{\omega_o} + \frac{\omega_b}{\omega_o} - 1 \right] = \frac{\pi}{2} \left[ \frac{f_{inj}}{f_o} + \frac{f_b}{f_o} - 1 \right]$$
(9)

Equations (4) and (9) form the cornerstones of the theory of QLL. Fig. 4(a) shows the variation of MQPE with change in  $f_0$  for a fixed  $f_{inj}$  of 7 GHz and injection strength (k) of 0.05. k is defined as the ratio of the injection current and the oscillator current [6]. Fig. 4(a) has two distinct regions,



Fig. 5. Block diagram of the proposed system (QLL).

locked and unlocked. As expected, the MQPE is 0 for  $f_{inj} = f_o$ . In the locked state the MQPE increases (almost linearly) as  $|f_{inj} - f_o|$  increases. MQPE goes to zero asymptotically (never reaching it) as  $|f_{inj} - f_o|$  increases in the unlocked state.



Fig. 6. Circuit architecture of QLL.

This suggests that the MQPE is a measure of the sign of f<sub>inj</sub> - f<sub>o</sub> in both locked and unlocked states. This in turn implies that a quadrature phase error detector can be used as a phase frequency detector (PFD) in an injection locking environment. Hence the quadrature error can be indeed used in a feedback system to set the natural frequency (fo) of the oscillator such that f<sub>o</sub> = f<sub>ini</sub>, thereby boosting the effective locking range. An interesting feature of this technique is that the MQPE itself can be controlled by changing the injection strength (k). It can be shown by equating expressions for MQPE in locked (4) and unlocked states (9) that the width of the linear region in Fig. 4 is given by 2 × f<sub>L</sub>. Increasing k increases the intrinsic locking range (f<sub>L</sub>) of the injection locked oscillator [7], thereby widening the linear region. Thus in Fig. 4(b), comparing k = 0.15 to k = 0.10, a greater MQPE range is observed for k = 0.15. For instance, in Fig. 4(b) the MQPE at  $f_0$  of 6.4GHz is  $3^{\circ}$  for k = 0.15 and  $1^{\circ}$  for k = 0.1. The MQPE for higher k will fall off more gradually to 0 as  $|f_{inj}-f_o|$  increases in the unlocked region. So potentially QLL's effective locking range can be increased further, by increasing the injection strength.

Fig. 5 shows the block diagram of the proposed system. It consists of an injection locked two-stage differential ring oscillator. Instantaneous quadrature error is measured by using a phase detector (PD), which takes the I and Q phases of the clock from an ILO as inputs. The error is averaged using a charge pump and a loop filter, and fed back to the oscillator's  $V_{ctrl}$ . The loop tracks the changes in the injected frequency and natural frequency of the oscillator until their difference  $|f_{inj} - f_o|$  is minimized, assuring a wide locking range. Fig. 6 shows the circuit diagram of the major sections of the QLL. The reference clock can be injected both electrically and optically. A trans-impedance amplifier (TIA) based optical

front-end is used in the latter case. The TIA consists of an inverter with a resistor of value 4 k $\Omega$ , connected in feedback. The bandwidth of the TIA is more than 10GHz. The TIA's output voltage amplitude (150 mV) is sufficient for the IL architecture because of its high voltage gain [1]. The electrical input is provided directly by an on-chip  $50\Omega$  transmission line. An analog multiplexer is used to select between the electrical and optical (from TIA) inputs. The selected input is fed to the single to differential convertor. It consists of an NMOS with symmetrical drain and source loads. The differential outputs from the drain and source are 180° apart within an 11GHz bandwidth. Outputs from the single to differential convertor are ac coupled to the ILO injection ports. Each ILO consists of a V/I converter and a two-stage, cross-coupled, pseudo differential current-starved ring oscillator. A two-stage ring oscillator architecture is chosen and its power consumption is minimized at the cost of worse phase noise. The design relies on the large jitter tracking bandwidth of the QLL to attenuate the phase noise contribution of the noisy but low-power ring oscillator. The bias circuit is designed such that current starvation is achieved in both PMOS and NMOS in the inverters of the ring oscillator for a 50% duty cycle. Current injection is achieved by NMOS differential pair without resistive loads. Similar to [6], this helps in extenuating the interaction with the DC bias at the injection point. A simple XOR-XNOR based phase detector takes the I and Q phases of the clock from the ILO as inputs. It generates Up and Dn signals containing the instantaneous quadrature error information. The Up and Dn signals are filtered by a passive low-pass RC filter to attenuate the high frequency (2f) component. This helps in suppressing the amplitude of the inputs to the next stage, thereby preventing distortion. The values of RC are chosen to be 1 K $\Omega$  and 25 fF respectively. The filtered Up and Dn



Fig. 7. Transient locking characteristics of Simulink model of QLL for two different loop filters.



Fig. 8. Step response and transfer function of linearized QLL Simulink model for different loop bandwidths (small signal behavior).

signals are further averaged using a simple charge pump and a loop filter consisting of a capacitor of value 1 pF. The charge pump consists of an amplifier with an NMOS differential pair and diode connected PMOS loads. The differential output of the amplifiers is converted to a single ended output by

current mirroring. The body biases of the NMOS differential pair in the charge pump is used for externally calibrating for the current mismatch in the charge pump. The bandwidth of the charge pump filter is digitally controllable, by altering the load on the differential pair. The output of the charge pump



Fig. 9. (a) Transient locking characteristics of QLL. (b) Ring oscillator characteristics.



Fig. 10. Locking transient for two different initial conditions.

and loop filter is fed back to the oscillator's  $V_{ctrl}$ , thereby completing the loop.

#### III. QLL ANALYSIS

In this section we propose a mathematical model of our system. We analyze the effect of the quadrature error correcting loop on the injection locking dynamics and discuss the dynamics of the overall system. We show that the overall system can be designed to have a first-order behavior, and bolster our claims with Simulink based behavior modelling and measured results.

The dynamics of the QLL is similar to that of an ILO except for the fact that the oscillator's natural frequency  $\omega_0$  is not fixed anymore. It continuously changes based on the  $V_{ctrl}$  (Fig. 5). Thus QLL dynamics can be described by changing the fixed  $\omega_0$  in the Adler's equation [13] with a varying  $\omega_0$ , which is a sum of a fixed component  $\omega_{vco}$  and a time varying component  $K_{vco} \times V_{ctrl}$ .

$$\frac{d(\theta)}{dt} = \omega_{vco} + K_{VCO}V_{ctrl} - \omega_{inj} - \omega_L sin(\theta)$$
 (10)

As shown in Fig. 5,  $V_{ctrl}$  is generated after low-pass filtering the transient quadrature phase error  $(\emptyset_{qe}(t))$ . The low-pass filter has a frequency response of  $H(\omega)$  with bandwidth  $(\omega_{filter})$  chosen such that  $\omega_{filter} << \omega_L$ . Denoting h(t) as the impulse response of filter  $H(\omega)$ , we get  $V_{ctrl} = h(t) * \emptyset_{qe}(t)$ . The expression of  $\emptyset_{qe}(t)$  can be further elaborated from (3) by writing  $\omega$  as  $\omega_{inj} + d\theta/dt$  [7] and  $\omega_0$  as  $\omega_{vco} + K_{vco} \times V_{ctrl}$ .

$$\emptyset_{qe}(t) = \frac{\pi}{2} \left( \frac{\omega_{inj} + \frac{d\theta}{dt}}{\omega_{oco} + K_{VCO}V_{ctrl}} - 1 \right)$$
(11)

This can be simplified as follows relying on (10):

$$\emptyset_{qe}(t) = \frac{\pi}{2} \left( \frac{-\omega_L \sin(\theta)}{\omega_{pco} + K_{VCO} V_{ctrl}} \right)$$
(12)

The control voltage is then given by

$$V_{ctrl} = h(t) * \left(\frac{\pi}{2} \left(\frac{-\omega_L sin(\theta)}{\omega_{pco} + K_{VCO} V_{ctrl}}\right)\right)$$
(13)

At equilibrium  $d\theta/dt=0$  and  $\omega_{vco}+K_{VCO}V_{ctrl}=\omega_{inj}$ . Substituting these values in (10) we get that in equilibrium,  $\theta=0$ . The highly non-linear nature of (10) and (13) make it difficult to get a convenient closed form solution. However, we can still gain some insight about how the loop behaves with regard to input noise ( $\theta_n$ ) by linearizing (10) about the equilibrium point (i.e.  $\theta=0$ ). We replace  $\theta$  with  $\theta+\theta_n$ , and  $V_{ctrl}$  with  $V_{ctrl}+\Delta V_{ctrl}$  in (10). Here  $\theta_n$  is a small perturbation in  $\theta$  and  $\Delta V_{ctrl}$  is the small perturbation in  $V_{ctrl}$  in response to  $\theta_n$  (small signal assumption). Using the fact that under equilibrium  $\theta=0$  and  $\omega_{vco}+K_{VCO}V_{ctrl}=\omega_{inj}$  we get

$$\frac{d(\theta_n)}{dt} \approx K_{VCO} \Delta V_{ctrl} - \omega_L \theta_n \tag{14}$$

Following the same substitution for (13)

$$\Delta V_{ctrl} \approx h(t) * \left(\frac{\pi}{2} \left(\frac{-\omega_L \theta_n}{\omega_{ini}}\right)\right)$$
 (15)

Substituting the value of  $\Delta V_{ctrl}$  from (15) in (14)

$$\frac{d(\theta_n)}{dt} \approx K_{VCO}h(t) * \left(\frac{\pi}{2} \left(\frac{-\omega_L \theta_n}{\omega_{inj}}\right)\right) - \omega_L \theta_n$$
 (16)



Fig. 11. QLL based clock distribution and deskewing architecture for a 4 channel optical receiver.



Fig. 12. Single channel quarter-rate receiver.

If  $\theta_n$  varies faster than  $\omega_{\text{filter}}$  then  $h(t)*(\frac{\pi}{2}(\frac{-\omega_L\theta_n}{\omega_{inj}}))\approx 0$  and from (16) we have

$$\frac{d(\theta_n)}{dt} = -\omega_L \theta_n \tag{17}$$

This is similar to a first-order PLL response with bandwidth  $\omega_{\rm L}$ , characteristic of an injection locked system [13]. If  $\theta_{\rm n}$  varies slower than  $\omega_{\rm filter}$  then  $h(t)*(\frac{\pi}{2}(\frac{-\omega_{\rm L}\theta_{\rm n}}{\omega_{\rm inj}}))\approx \frac{\pi}{2}(\frac{-\omega_{\rm L}\theta_{\rm n}}{\omega_{\rm inj}})$  and again from (16) we have

$$\frac{d(\theta_n)}{dt} = -\left(\frac{\pi}{2} \times \frac{K_{VCO}}{\omega_{ini}} + 1\right) \omega_L \theta_n \tag{18}$$

This is also a first-order PLL response with a bandwidth higher than  $\omega_L$ . The exact bandwidth is not important in this case because the variation in  $\theta_n$  is much slower than  $\omega_L$ . So overall

the system allows all the variations in the  $\theta_n$  slower than  $\omega_L$  to go through, and attenuates all variations faster than  $\omega_L$  with -20 db/dec (first-order) slope. This is an important conclusion. It essentially means that allowing the quadrature error correction loop to run much slower than the injection locking loop ensures that the system has a first-order response with bandwidth the same as that of an ILO, i.e.,  $\omega_L$  [14].

In order to investigate the stability of the system with greater accuracy, a behavioral model was constructed in Simulink. The model was initialized to set  $f_o$  to 5 GHz and  $f_{inj}$  to 7 GHz. The ILO's inherent locking range  $(f_L)$  was set to 175 MHz. Fig. 7 shows the transient response of QLL Simulink model for two different loop bandwidths. The first with loop bandwidth of 100 kHz  $(\ll\ f_L)$  and second with loop bandwidth of 20 MHz (comparable to  $f_L)$ . In both cases the system attains

the same final locked state, i.e.,  $\theta = 2n\pi$  and f = 7 GHz. However, there are some important differences. In the first case the transient has a first-order response with no overshoot whereas the second case has significant ringing in its transient response and is thus farther from stability.

To further analyze the stability of the QLL model, we linearized the Simulink model around the equilibrium point. The input of this state-space model was the phase of the reference clock and the output was the phase of the QLL output. The transfer function of this model is equivalent to jitter tracking response of the reference to the output of the QLL. The inherent locking range (f<sub>L</sub>) of the ILO was fixed to 175 MHz and we simulated the linearized model for two different loop bandwidths. The first with loop bandwidth of 100 kHz such that it was << f<sub>L</sub>. A phase step was applied to the input of the QLL model and a first-order response observed. There was no overshoot in the step response and no peaking in the system transfer function (Fig. 8) with  $-20 \, dB/dec \, decay$ . In the second case we set the loop bandwidth to 20 MHz which is much closer to  $\omega_L$ . We observed ringing in the step response and system transfer function had some peaking and had a secondorder (-40 dB/dec) decay. The model suggests that in order for the system to be stable the secondary loop needs to run much slower than the bandwidth of the injection locking itself. If the above condition is assured then the bandwidth of the system is the bandwidth of the ILO (f<sub>L</sub>).

Fig. 9(a) shows the transient locking characteristics (frequency and V<sub>ctrl</sub>) of the proposed QLL. For the simulation, the injected frequency was fixed to 7 GHz and the initial frequency of the oscillator was 7.7 GHz, such that system was outside its locking range. The locking takes place in three different stages. When the system is in the unlocked state the loop brings the frequency of the oscillator close to the injected frequency. When the frequency of the oscillator comes within the injection locking range of the ILO, frequency lock is achieved. However, the phase still keeps changing. The loop changes the V<sub>ctrl</sub> of the oscillator until the quadrature error is nullified, i.e., when  $f_0 = f_{inj}$ . This negative feedback loop ensures that  $f_o = f_{inj}$  and there is no phase error in the outputs. Fig. 9(b) shows ring oscillator's frequency vs. control voltage characteristics. In the final locked state the V<sub>ctrl</sub> settles to 0.61 mV such that the natural frequency of the ring oscillator is equal to 7 GHz (Fig. 9(b)). Transient simulations were repeated to show that the QLL has inherent frequency detection in both directions as shown in Fig. 10. The injected frequency was kept at 7 GHz and the initial frequency was kept at 7.75 GHz (>7 GHz) in one case and at 6.65 GHz (<7 GHz) in the other. The system locks, in both cases, to the injected frequency. Difference in locking times is because of the dependence of MQPE on fo (4).

## IV. CLOCKING FOR AN ENERGY PROPORTIONAL OPTICAL RECEIVER

The clocking structure is shown in Fig. 11. The optical receiver has four optical data inputs and one forwarded clock (electrical/optical) input. The optical clock is converted to an electrical clock using a TIA. The electrical clock is then sent



Fig. 13. Level-shifter circuit used in adaptive body biasing.

to a global QLL circuit. The QLL generates four quadrature phases. The four phases are distributed without any repeaters and sent to local ring oscillators, which are placed near the clocked optical receivers. The local ring oscillators are injection locked to the global clock and frequency of oscillation is varied to control the phase of the local ring oscillator's output (deskew). The data receivers have a quarter-rate architecture and hence require accurate quadrature phases. Symmetric injection with four clock phases ensure that quadrature accuracy is maintained even with deskew. This is described in a greater detail in the next section.

The optical receiver uses a photodiode to convert an incoming optical signal to electrical current. If a simple resistor is used to convert the current of a photodiode to a voltage, for a target signal-to-noise ratio (SNR) and a given photodiode capacitance, the input time constant (RC) severely limits the bandwidth and data-rate of the receiver. In order to increase the RC bandwidth while maintaining the same gain, transimpedance amplifiers (TIAs) are commonly employed. The overall bandwidth of conventional TIAs is chosen to be (RC)<sup>-1</sup>. Such high-bandwidth TIAs are highly analog, power hungry, and do not scale well with technology. A more recent approach uses an integrating front-end and a resistor termination with a time constant that is much larger than the bit interval (RC  $\gg$  T<sub>b</sub>) [15]. Dynamic offset modulation is then used to provide a constant voltage at its input regardless of the data sequence. The architecture of the receiver presented in Fig. 12, shows the top-level architecture of the adaptive receiver (single channel) with dynamic BB using V<sub>ctrl</sub> of the QLL. The first stage of the receiver is a low-power TIA with 3 k $\Omega$  feedback resistor. The TIA's output is sampled at the end of two consecutive bits  $(V_n, V_{n+1})$  and these samples are compared to resolve each bit. The TIA provides isolation between PD's capacitor and sampling capacitors, which reduces chargesharing effect and enables use of ultra-low capacitance photodetectors in scaled silicon photonic technologies. Besides, for a given PD capacitance, S/H capacitors can be chosen to be bigger (even comparable to PD's capacitance) to relieve KT/C noise. This had been an important bottleneck in double



Fig. 14. (a) Symmetric injection architecture; (b) simulation based comparison of two phase and symmetric injection.



Fig. 15. Chip micrograph and layout details.

sampling optical receivers in the past [15], [16]. Sampling capacitors are followed by an amplifier, which also provides isolation between sampling nodes and sense-amp to minimize kickback [17]. The dynamic offset modulation employed at the output of the amplifier introduces an offset so that the sense-amp differential input is always constant regardless of the previous bit. The sense-amp is followed by an SR-latch to retrieve the NRZ data. Similar to [18] and [19], dynamic offset modulation provides a constant voltage at sense-amp's input regardless of the bit sequence. De-multiplexing factor of four is achieved immediately after the TIA using quarter-rate clocked samplers.

#### A. Adaptive Body Biasing

The optical receiver implementation shown in Fig. 12 has analog building blocks with bias currents. These are biased to provide the maximum bandwidth and gain for operation at

the highest data-rates, thus consuming maximum power. For operation at lower data-rates a high bandwidth is not required. However, since the bandwidth of the analog components do not change with data-rates, power is 'wasted'. This leads to degradation of the power efficiency (the energy per-bit) of the optical receiver at lower data-rates [15], [16], [17]. It is advantageous to bias the circuits adaptively so as to reduce the bias current (and hence power) of the analog components at lower data-rates. This requires information about the datarate and a method to use this information to change the bias currents of the analog components. The former is provided by the QLL as it generates the V<sub>ctrl</sub> which is dependent on the input clock frequency, hence the data-rate. The latter is achieved by taking advantage of the FDSOI (fully depleted silicon on insulator). In this process, the channel forms in an ultra-thin (7 nm) layer of intrinsic silicon over a layer of buried oxide (BOX). Given the extreme thinness of the buried oxide layer (25 nm) and the conducting layer under



Fig. 16. Phase noise and integrated jitter measurements for 8 GHz (electrical and optical) and 11 GHz (electrical).



Fig. 17. Measured phase noise of the locked QLL output across the entire locking range.

the BOX, effect of body biasing (BB) is improved compared with standard CMOS process. By connecting the transistor bodies to a bias network in the circuit layout rather than to power or supply,  $V_{th}$  of the transistors can be tuned by 80 mV per 1 V modulation of  $V_{BB}$ . This proves crucial in adaptively body biasing the critical devices in the amplifier and the TIA. The  $V_{ctrl}$  generated by the QLL follows the ring oscillator's characteristics as shown in Fig. 9(b), i.e., as the reference frequency increases the  $V_{ctrl}$  decreases from 1 to 0. The body bias generator, shown in Fig. 13, is a level shifter with an input from  $V_{ctrl}$  of QLL and two outputs connected to the



Fig. 18. (a) Measured jitter transfer function for 8 GHz reference; (b) response to low frequency (10 MHz) and high frequency (1 GHz) jitter.

PMOS and NMOS of the TIA and an output connected to the tail current of the amplifier block. These signals control the gain-bandwidth of these analog blocks. The transfer function



Fig. 19. QLL response to supply noise compared to unlocked (no reference) case.

of the body bias generator is designed such that the bandwidth of TIA and Amplifier remains proportional to data-rate. This is achieved by first recognizing that the frequency of forwarded clock is proportional to data-rate and deriving the transfer function from clock frequency to  $V_{ctrl}$  of QLL. Next, the  $V_{BB}$  versus bandwidth of TIA and amplifier is characterized separately and is used to create the transfer function of interest which is  $V_{ctrl}$  of QLL to  $V_{BB}$  outputs. By lowering the bandwidth of analog blocks at lower data-rates we lower their energy consumption. Since the energy consumption of the digital blocks is inherently proportional to date-rate, adaptive body-biasing helps achieving energy-proportional operation in the optical receiver.

#### B. Symmetric Injection and Deskew

Deskew in an ILO (locked at  $f_{inj}$ ) can be performed by varying the natural frequency of oscillation ( $f_o$ ) of the oscillator. The amount of deskew is given by [6]

$$deskew = \sin^{-1}\left(\frac{f_o - f_{inj}}{f_l}\right) \tag{19}$$

where  $f_1$  is the locking range of the ILO. If the input clock is injected in only one of the delay stages, the asymmetry between the effective delays of the delay stages leads to quadrature phase mismatch between I and Q phases of the oscillator. Combining (19) and (3) we get

$$Quad.Error = \frac{\pi}{2} \left( \frac{-f_l \sin(deskew)}{f_l \sin(deskew) + f_{inj}} \right)$$
(20)

Equation (20) suggests that as the deskew increases so does the magnitude of the quadrature error. So as  $f_0$  is varied to

invoke deskew, the I and Q phases of the ILO do not shift by an equal amount. Inaccuracies in the quadrature phases may lead to increased BER in the quarter-rate receiver.

The trade-off between deskew and quadrature error is broken by injecting all four phases of clock generated by the QLL into both the delay elements of the ILO (Fig. 14(a)). This symmetric injection of clock allows the variation of the delay of both the delay elements by equal amount. Thus, even when the f<sub>0</sub> of the ILO is varied, the inherent symmetry in the delay elements allows the phase relationship between the I and Q phases to be constant, resulting in no quadrature error. This fact is exemplified in the simulation of ILOs with two phases (clock and clock bar) and symmetric injection, as shown in Fig. 14(b). The V<sub>ctrl</sub> of the two ILOs is varied to change their fo. This leads to quadrature error in the former cases whereas in the latter the phase relationship between the I and Q phases remains 90°. Fig 10 shows the structure of the local ILO. It has the same two-stage pseudo differential architecture as the ring oscillator used in the QLL (Fig. 6). The V<sub>ctrl</sub> generated by the QLL is also distributed to the local ILOs. This is used to set the natural frequency of the ILO (f<sub>0</sub>) close to the injected frequency (fini). To invoke deskew, the  $(f_0)$  of the local ILO is varied externally (Fig. 11).

#### V. HARDWARE MEASUREMENT

The chip is fabricated in a 28 nm FDSOI CMOS process. The die micrograph and core detail are presented in Fig. 15. The core area is 300  $\mu$ m  $\times$  60  $\mu$ m. The top metal layers are designed to be compatible with copper-pillar flip-chip bonding as well as bond-wire. The clock output from the QLL is



Fig. 20. Measured quadrature phase error vs. reference frequency and measured quadrature phase waveforms at 5, 8, and 11 GHz.

symmetrically distributed to all four local ILOs with a total trace length 260  $\mu m$ .

In our measurement setup, an external signal generator (Anritsu N5181B) is used to provide the reference clock used for injection. The reference power level was kept at -10 dBm. The frequency of the reference clock was varied and output waveforms were observed on an Agilent 86100D sampling oscilloscope. To demonstrate the increase in locking range we disable the loop and set the  $V_{ctrl}$  of the ILO at  $V_{DD}/2$ . Without the quadrature phase error tracking, a locking range of 7–7.4 GHz (5%) is observed at an injection strength (k) of 0.05. With the loop activated the locking range improves to 4-11 GHz (90%). The achieved locking range is limited by the tuning range of the ring oscillator. In order to measure the response of the QLL to fast changes in frequency, the frequency of the reference clock was changed in steps of 2 GHz with each step having a time duration of 1ms (equipment limited). The large bandwidth of the QLL allows it to sustain 2 GHz frequency step changes in frequency without losing lock. Fig. 16 shows the measured phase noise of the output of the QLL in both locked and unlocked states at 8 GHz. A 40dB improvement is observed at 1MHz offset, between the locked and unlocked states. Integrated output jitter (100 kHz-1 GHz) of 558 fs and 577 fs are measured at 8 GHz for electrical and optical inputs respectively. At the highest locking frequency (11 GHz) the integrated output jitter is 642 fs. The first-order (-20 dB/dec) nature of QLL does not allow it to suppress flicker noise (30 dB/dec) of the ring oscillator effectively. This is why the QLL output cannot track the reference phase noise exactly (Fig. 16) for frequency



Fig. 21. Test setup for optical receiver.

offsets less than the jitter tracking bandwidth (JTB). This suppression can be improved by increasing the JTB further, by increasing the injection strength. Fig. 17 shows the measured phase noise (at 10 MHz offset) of the locked QLL across the entire locking range. A phase noise variation of only 6 dB is observed as the frequency is varied from 4 GHz to 11 GHz. Thus, QLL maintains low phase noise performance across its entire locking range.

Fig. 18(a) shows the measured jitter transfer function of the system for a reference frequency of 8 GHz. It has a low-pass characteristic with a JTB of 150 MHz and a -20 dB/dec



Fig. 22. Measured eye diagram (a) and BER (b) with PRBS 15 optical data at 32 Gb/s.



Fig. 23. (a) BER vs. optical power (receiver sensitivity) at different data-rates; (b) optical sensitivity vs. data-rate.

attenuation, suggestive of a first-order system. High JTB helps in retaining the low frequency jitter while eliminating high frequency jitter as depicted in Fig. 18(b). It is important to retain the low frequency jitter in forwarded clock receivers as low frequency jitter is correlated with the data [20].

Ring oscillators are susceptible to power supply variations [21]. Power supply variations directly translate into phase noise and jitter in the ring oscillators' output as their oscillation frequency is a strong function of V<sub>DD</sub>. Substrate noise also directly affects the total oscillator jitter and is found to be strongly correlated to supply variations [21]. High frequency noise on the supply can be reduced adding bypass capacitors. However, low frequency V<sub>DD</sub> noise is more difficult to eliminate with bypass capacitors because of significant area penalty. Injection locking helps in suppressing low frequency V<sub>DD</sub> noise as shown in Fig. 19. V<sub>DD</sub> noise transfer has a high pass transfer function with a bandwidth of 150MHz and a -20 dB/dec attenuation. This is complementary to the jitter transfer function measurement (Fig. 18(a)) and characteristic of a first-order injection locked system. The measurement is made by adding sinusoidal noise (10 MHz-1 GHz) on



Fig. 24. Measured deskewed waveform for 32 Gb/s data.

the  $V_{DD}$  using a bias tee and then measuring the relative frequency sidebands on the output in unlocked and locked cases (Fig. 19).

|                              | This work                      | [2]                 | [6]                          | [5]                                     | [23]                                         | [24]                               | [9]                |
|------------------------------|--------------------------------|---------------------|------------------------------|-----------------------------------------|----------------------------------------------|------------------------------------|--------------------|
| Architecture                 | QLL                            | ILO                 | ILO                          | IL-PLL                                  | PPM IL                                       | Dig. DLL                           | QILO <sup>††</sup> |
| Oscillator                   | CMOS<br>Ring                   | CMOS<br>Ring        | CMOS<br>Ring                 | CMOS<br>Ring                            | CMOS<br>Ring                                 | NA                                 | LC                 |
| Technology                   | 28nm                           | 250nm               | 90nm                         | 65nm                                    | 20nm                                         | 14nm                               | 130nm              |
|                              | FDSOI                          | BiCMOS              | CMOS                         | CMOS                                    | CMOS                                         | CMOS                               | CMOS               |
| Locking                      | 4GHz-                          | 340MHz              | 203MHz                       |                                         |                                              | 2-7.5GHz <sup>†</sup>              | 26.4-              |
| range                        | 11GHz                          | 340WIIIZ            | 203101112                    |                                         |                                              | 2-7.3GHZ                           | 29.7GHz            |
| Output Integrated Jitter (σ) | 558fs -<br>577fs*<br>(at 8GHz) | _                   | <1.5ps<br>(RMS at<br>2.5GHz) | 0.7ps at<br>1.2GHz<br>(10kHz-<br>40MHz) | 434fs/268fs<br>at 15GHz<br>(100kHz-<br>1GHz) | 176fs with<br>200K hits at<br>7GHz | _                  |
| I/Q error                    | 1.5°                           | 0.70**              | 4.5°                         | NA                                      | NA                                           | 1                                  |                    |
| Active Area                  | $0.003 \text{mm}^2$            | 0.09mm <sup>2</sup> | $0.026 \text{mm}^2$          | $0.022 \text{mm}^2$                     | $0.044 \text{mm}^2$                          | $0.0024 \text{mm}^2$               | 1mm <sup>2</sup>   |
| Supply                       | 1V                             | 3V                  | 1.2V                         |                                         | 1.25/1.1V                                    |                                    | 1.3                |
| Power Diss.                  | 2.77mW at                      | 15mW at             | 1.3mW at                     | 0.97mW at                               | 46.2mW at                                    | 4.4mW at                           | 38.6mW at          |
| (P) at (F)                   | 11GHz                          | 2.7GHz              | 2GHz                         | 1.2GHz                                  | 15GHz                                        | 7GHz                               | 26.5GHz            |
| FOM                          | -239.4dB                       |                     | -235.3dB                     | -243.2dB                                | -234.8dB                                     | NA                                 |                    |

TABLE I
PERFORMANCE COMPARISON OF THE QLL

Quadrature phase accuracy between the phases of the QLL outputs is confirmed by measuring their phase difference. The quadrature output phases (I and Q) of the QLL are selected using an on-chip digital multiplexer. Quadrature error is measured in a two-step process. First, the 'I' phase is selected and its phase difference with the input reference is measured. Then the digital bit to the multiplexer is altered to select the 'Q' phase and its phase difference with the input reference is measured. The difference between the two measured values provides the quadrature phase error. This multiplexing allows the I and Q phases to have the same signal paths and hence a more accurate measurement is made. Fig. 20 shows the measured quadrature accuracy across 4-11 GHz and the corresponding  $3\sigma$  error margins. The  $3\sigma$  error margins are obtained by measuring quadrature error across > 100K periods for the same test chip. Based on the absolute mismatch from 90° (Fig. 20) an average quadrature offset of 1.5° is observed across 4-11 GHz.

The quadrature error of the QLL output is sensitive to mismatch in XOR-XNOR detectors and charge pump (Fig. 6). Non-minimum gate-length devices and symmetrical layout techniques are used to minimize the mismatch. The mismatch can be further reduced by using calibration loops. In our test chip the measured mismatch due to the XOR gates was negligible, but in a complete system implementation an offset compensation technique might be necessary. A possible

solution involves offset control in the charge pump circuit (Fig. 6) via an external loop. We can first fix the body bias of the charge pump NMOS differential pairs to be 0.5 V each, and observe the quadrature mismatch [22] of the locked QLL outputs. Then alter the body bias of one of the NMOS devices until the quadrature error is minimized.

The optical test setup is shown in Fig. 21. For optical testing, the receiver is bonded to a photodiode with responsivity of 0.9 A/W. The total capacitance at the input node was estimated to be 120 fF. The optical beam from a 1550 nm distributed feedback (DFB) laser is modulated by a highspeed Mach-Zender modulator (MZM) and coupled to the photodiode with a single-mode fiber. The optical fiber is placed close to the photodiode aperture using a micro-positioner (butt coupling). As the beam has a Gaussian profile, the gap between the fiber tip and the photodetector causes optical intensity loss. Combined optical loss due to the optical coupling and optical connector is measured to be 2.8 dB. Quarter-rate clock generated by the pattern generator was used as (electrical) reference for the QLL. The functionality of the receiver is validated using the PRBS-7, 9, 15 sequences generated by the pattern generator. Each of the four channels are tested separately. Fig. 22(a) shows the recovered quarter-rate data eye diagram for 32 Gb/s optical data, for one of the channels. Fig. 22(b) shows the bath curves for 32 Gb/s and 20 Gb/s.

<sup>\*</sup>Optical clock input \*\*Not measured directly \* working range \*†\* with envelope detection



Fig. 25. (a) Power breakdown at 32 Gb/s; (b) energy efficiency per bit across different data-rates.

TABLE II
PERFORMANCE COMPARISON OF THE OPTICAL RECEIVER

|             | This work [25]                         | [17]                 | [26]                    | [27]                 |  |
|-------------|----------------------------------------|----------------------|-------------------------|----------------------|--|
| Technology  | 28nm FD SOI                            | 28nm CMOS            | 65nm CMOS               | 28nm CMOS            |  |
| Data-Rate   | 32Gb/s                                 | 25Gb/s               | 28Gb/s                  | 28Gb/s               |  |
| Efficiency  | 103fJ/bit data and 50fJ/bit clock      | 170fJ/bit*           | 3.25pJ/bit              | 1.03pJ/bit           |  |
| Active area | 0.3x0.06mm <sup>2</sup><br>(4 channel) | $0.0018 \text{mm}^2$ | 3.25mm <sup>2</sup>     | 0.318mm <sup>2</sup> |  |
| Sensitivity | -8.8dBm at                             | -6.8dBm at           | -9.7dBm at -6dBm at 100 |                      |  |
| (Optical)   | 32Gb/s                                 | 25Gb/s               | 25Gb/s                  | -oubili at 1000/8    |  |

<sup>\*</sup> Excludes clocking

These were obtained by externally varying the phase of the reference clock to the QLL to cover 1 UI. Error free (BER =  $10^{-12}$ ) operation is shown for 0.16UI and 0.33UI for 32 and 16 Gb/s respectively. The maximum achievable data-rate (32 Gb/s) is limited by the maximum data-rate of the external pseudo random bit sequence (PRBS) generator. Fig. 23(a) shows the measured BER as the optical power is varied for different data-rates. From this information we derive the optical sensitivity as shown in Fig. 23(b). The receiver achieves more than -12 dBm of sensitivity at 16 Gb/s, which reduces to -10 dBm at 28 Gb/s and -8.8 dBm at 32 Gb/s. Sensitivity degradation with increased data-rate is mainly due to reduced bit interval and integration time.

The amount of phase shift allowed by the local ILO is measured by varying the deskew (Fig. 11), at 8 GHz for 32 Gb/s operation. Agilent 86100D sampling oscilloscope is used to record the ILO waveforms for different values of the V<sub>ctrl</sub>. A total deskew range of 137° is measured (Fig. 24). The optical receiver needs a maximum deskew range of 90° because of its quarter-rate architecture, so a measured deskew range greater than 90° proves sufficient.

A low-power two-stage ring oscillator and simplicity of injection locking ensures that the QLL circuit only consumes 2–2.8 mW for 4–11 GHz operation. As shown in Fig. 25(a), the power consumption increases with operation frequency. This is due to the digital nature of the ring oscillator. The power efficiency (Fig. 25(a)) decreases as frequency increases making it suitable for high-speed applications. The receiver's power breakdown and power efficiency (energy per-bit) are shown in Fig. 25(b). Total power consumption per channel at the highest data-rate (32 Gb/s) is 4.87 mW. The QLL and local ILOs consume a third of the total power. To show the efficacy of the adaptive body biasing scheme, two sets of measurements are done with the adaptive V<sub>BB</sub> generator on and off (Fig. 25(a)). When adaptive V<sub>BB</sub> generator is active, the per-bit energy efficiency improves from 103 fJ/b at 32 Gb/s to 94 fJ/b at 16 Gb/s. Without the body bias the per-bit energy efficiency at 16 Gb/s is 160 fJ/b.

Table I compares the QLL with prior art. The QLL based frequency tracking technique allows us to achieve the best locking range and robust I/Q performance compared to other works. The optical receiver is compared with prior art in Table II. Low-power QLL based clocking and body biasing helps achieve the best energy efficiency compared to the state-of-the art. The receiver functionality is verified up to 32 Gb/s of data-rate. Adaptive body biasing scheme enables total power

consumption of less than 186 fJ/b in a wide range of data-rates. The sensitivity of the receiver was measured to be -8.8 dBm at 32 Gb/s.

#### VI. CONCLUSIONS

A new frequency tracking technique based on the quadrature phase error cancellation in an injection locked ring oscillator was introduced and analyzed. The QLL technique improves the ILOs' locking range from 5.5% (7–7.4 GHz) to 90% (4–11 GHz) without using a phase frequency detector (PFD). The dynamics of the system were derived and were shown to have first-order characteristics. This guarantees stability without peaking, unlike a second-order injection locked PLL. The QLL was used to generate accurate quadrature clock phases, without any frequency division, for a source-synchronous 4-channel optical receiver using a forwarded clock at quarter-rate. The receiver architecture features a double-sampling receiver with dynamic offset modulation and low-bandwidth TIA. The system was implemented in 28 nm FD SOI CMOS and operates up to 32 Gb/s of data-rate. The unique properties of the FD SOI technology were used in conjunction with the QLL and optical receiver to implement adaptive body biasing. This technique is essential in realizing an energy proportional optical receiver that maintains a constant energy-per-bit consumption at different data-rates.

#### REFERENCES

- L. Zhang, A. Carpenter, B. Ciftcioglu, A. Garg, M. Huang, and H. Wu, "Injection-locked clocking: A low-power clock distribution scheme for high-performance microprocessors," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 9, pp. 1251–1256, Sep. 2008.
   P. Kinget, R. Melville, D. Long, and V. Gopinathan, "An injection-
- [2] P. Kinget, R. Melville, D. Long, and V. Gopinathan, "An injection-locking scheme for precision quadrature generation," *IEEE J. Solid-State Circuits*, vol. 37, no. 7, pp. 845–851, Jul. 2002.
  [3] M. Raj and A. Emami, "A wideband injection-locking scheme and
- [3] M. Raj and A. Emami, "A wideband injection-locking scheme and quadrature phase generation in 65-nm CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 4, pp. 763–772, Apr. 2014.
- [4] J. Savoj et al., "Design of high-speed wireline transceivers for backplane communications in 28 nm CMOS," in *Proc. CICC*, Sep. 2012, pp. 1–4.
- [5] W. Deng, A. Musa, T. Siriburanon, M. Miyahara, K. Okada, and A. Matsuzawa, "A 0.022 mm<sup>2</sup> 970 μW dual-loop injection-locked PLL with -243 dB FOM using synthesizable all-digital PVT calibration circuits," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2013, pp. 248–249.
- [6] K. Hu, T. Jiang, J. Wang, F. O'Mahony, and P. Y. Chiang, "A 0.6 mW/Gb/s, 6.4–7.2 Gb/s serial link receiver using local injection-locked ring oscillators in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 899–908, Apr. 2010.
- [7] B. Razavi, "A study of injection locking and pulling in oscillators," IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 1415–1424, Sep. 2004.
- [8] G. Mangraviti et al., "A mm-wave 40 nm CMOS subharmonically injection-locked QVCO with lock detection," in Proc. IEEE Asian Solid-State Circuits Conf., Nov. 2013, pp. 421–424.
- [9] D. Shin, S. Raman, and K. J. Koh, "A mixed-mode injection frequency-locked loop for self-calibration of injection locking range and phase noise in 0.13 μm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2016, pp. 50–51.
- [10] S. Choi, S. Yoo, and J. Choi, "A 185-fsrms integrated-jitter and -245-dB FOM PVT-robust ring-VCO-based injection-locked clock multiplier," in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2016, pp. 194–195.
- [11] A. Ravi, K. Soumyanath, L. R. Carley, and R. Bishop, "An integrated 10/5 GHz injection-locked quadrature LC VCO in a 0.18 μm digital CMOS process," in *Proc. Eur. Solid-State Circuits Conf.*, 2002, pp. 543–546.

- [12] A. Mazzanti, M. B. Vahidfar, M. Sosio, and F. Svelto, "A low phasenoise multi-phase lo generator for wideband demodulators based on reconfigurable sub-harmonic mixers," *IEEE J. Solid-State Circuits*, vol. 45, no. 10, pp. 2104–2115, Oct. 2010.
- [13] R. Adler, "A study of locking phenomena in oscillators," Proc. IEEE, vol. 61, no. 10, pp. 1380–1385, Oct. 1973.
- [14] M. Hossain and A. Chan Carusone, "5-10 Gb/s 70 mW burst mode AC coupled receiver in 90-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 524–537, Mar. 2010.
- [15] M. H. Nazari and A. Emami-Neyestanak, "A 24-Gb/s double-sampling receiver for ultra-low-power optical communication," *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 344–357, Feb. 2013.
- [16] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90 nm CMOS 16 Gb/s transceiver for optical interconnects," *J. Solid-State Circuits*, vol. 43, no. 5, pp. 1235–1246, May 2008.
- [17] S. Saeedi and A. Emami, "A 25 Gb/s 170 μW/Gb/s optical receiver in 28 nm CMOS for chip-to-chip optical communication," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2014, pp. 283–286.
- [18] S. Saeedi, S. Menezo, and A. Emami, "A 25 Gbps 3D-integrated CMOS/silicon photonic optical receiver with -15 dBm sensitivity and 0.17 pJ/bit energy efficiency," in *Proc. Opt. Interconnects Conf. (OI)*, 2015, pp. 11-12.
- [19] S. Saeedi, S. Menezo, G. Pares, and A. Emami, "A 25 Gb/s 3D-integrated CMOS/silicon-photonic receiver for low-power high-sensitivity optical communication," *J. Lightw. Technol.*, vol. 34, no. 12, pp. 2924–2933, Jun. 15, 2015.
- [20] M. Hossain and A. Carusone, "A 6.8 mW 7.4 Gb/s clock-forwarded receiver with up to 300 MHz jitter tracking in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2010, pp. 158–159.
- [21] T. H. Lee and A. Hajimiri, "Oscillator phase noise: A tutorial," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 326–336, Mar. 2000.
- [22] P. Upadhyaya et al., "A 0.5-to-32.75 Gb/s flexible-reach wireline transceiver in 20 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, 2015, pp. 1–3.
- [23] J. Chien et al., "A pulse-position-modulation phase-noise-reduction technique for a 2-to-16 GHz injection-locked ring oscillator in 20 nm CMOS," in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 52–53.
- [24] A. Elshazly, A. Balankutty, Y.-Y. Huang, K. Yu, and F. O'Mahony, "A 2 GHz-to-7.5 GHz quadrature clock-generator using digital delay locked loops for multi-standard I/Os in 14 nm CMOS," in *Symp. VLSI Circuits Dig. Tech. Papers*, 2014, pp. 1–2.
- [25] M. Raj, S. Saeedi, and A. Emami, "A 4-to-11 GHz injection-locked quarter-rate clocking for an adaptive 153 fJ/b optical receiver in 28 nm FDSOI CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [26] T. Takemoto, H. Yamashita, T. Yazaki, N. Chujo, L. Yong, and Y. Matsuoka, "A 4× 25-to-28 Gb/s 4.9 mW/Gb/s -9.7 dBm highsensitivity optical receiver based on 65 nm CMOS for board-toboard interconnects," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2013, pp. 118–119.
- [27] T.-C. Huang, T.-W. Chung, C.-H. Chern, M.-C. Huang, C.-C. Lin, and F.-L. Hsueh, "A 28 Gb/s 1 pJ/b shared-inductor optical receiver with 56% chip-area reduction in 28 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2014, pp. 144–145.



Mayank Raj (S'08) was born in Patna, India, in 1987. He received the B.Tech. degree from the Indian Institute of Technology (IIT), Kanpur, India, in 2008, and the M.S. and Ph.D. degrees from the California Institute of Technology (Caltech), Pasadena, CA, USA, in 2009 and 2014, respectively, all in electrical engineering.

In 2014, he joined Xilinx Inc., San Jose, CA, USA, where he works on high-performance mixed-signal integrated circuits for high-speed and low-power interconnects.

Dr. Raj was the recipient of the 2008 California Institute of Technology Atwood Fellowship and the 2015 Intel/IBM/Catalyst Foundation CICC Student Scholarship Award. He holds 6 U.S. patents in the field of mixed-signal integrated circuit design.



Saman Saeedi received the double-major B.S. degree in electrical engineering and physics from Sharif University of Technology, Tehran, Iran, in 2010. He received the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology, Pasadena, CA, USA, in 2011 and 2015, respectively.

He is currently a member of the VLSI research group at Oracle Labs. The focus of his current research is low-power, high-performance mixedsignal integrated circuits with applications in

signal integrated circuits with applications in sensing and communication. During the summer of 2012, he was a Ph.D. intern at Apple Inc. where he worked on display driver chipsets. His work during fall and winter of 2014 at Rockley Photonics Inc. enabled a core technology for CMOS/silicon-photonic optical packet switching in data centers.

Dr. Saeedi is a Gold Medal winner of the National Physics Olympiad and recipient of four years undergraduate fellowship from the National Elite Foundation of Iran. He received the Atwood Fellowship in Fall 2010, and is the recipient of the 2014 Intel/Texas Instruments/Catalyst Foundation CICC Student Scholarship Award and a finalist of the 2015 Broadcom Foundation University Research Competition.



**Azita Emami** (S'97–M'04) was born in Naein, Iran. She received the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1999 and 2004, respectively. She received the B.S. degree with honors from Sharif University of Technology, Tehran, Iran, in 1996.

She is currently a Professor of electrical engineering at the California Institute of Technology, Pasadena, CA, USA. From July 2006 to August 2007, she was with Columbia University, New York, NY, USA, as an Assistant Professor in the Depart-

ment of Electrical Engineering. She also worked as a Research Staff Member at IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, from 2004 to 2006. Her current research areas are high performance mixed-signal integrated circuits and VLSI systems, with the focus on high-speed and low-power optical and electrical interconnects, clocking, biomedical implant and compressed sensing.