# A 25-Gb/s Avalanche Photodetector-Based Burst-Mode Optical Receiver With 2.24-ns Reconfiguration Time in 28-nm CMOS Kuan-Chang Chen<sup>®</sup>, Student Member, IEEE, and Azita Emami<sup>®</sup>, Senior Member, IEEE Abstract—This paper describes an avalanche photodetector (APD)-based optical receiver, applicable to the burst-mode operation, in 28-nm CMOS technology. With the aims of benefiting the overall optical link power efficiency and link bandwidth, the optical receiver is designed to have high sensitivity and high reconfiguration speed for burst-mode operation. The sensitivity of the receiver is optimized by adjusting the responsivity of APD via its reverse bias voltage, which leads to the highest signal-to-noise ratio (SNR) at the front end. The two-tap feedforward equalization (FFE), along with two-tap decision feedback equalization, is implemented in a current-integrating fashion to further improve the sensitivity with superior power efficiency to their resistively loaded counterparts. Integrating dc comparator and integrating amplitude comparator are proposed to replace the conventional RC low-pass filters and peak detectors, respectively, in extracting the information of dc offset and signal amplitude within two unit intervals (UIs), empowering significant acceleration of the burst-mode reconfiguration. When the APD is biased at -16 V, its overall responsivity at 1310 nm is 4 A/W, and the optical receiver achieves bit-error-rate (BER) better than $10^{-12}$ at -16-dBm optical modulation amplitude, 2.24-ns reconfiguration time with 5-dB dynamic range, and 1.37-pJ/b energy efficiency at 25 Gb/s. Index Terms—Avalanche photodetector (APD), burst mode, current integrating, double sampling, equalization, optical, receiver, reconfiguration. # I. INTRODUCTION PTICAL interconnects have wide applications in modern data communication and computing systems, including data center networks. The roadmaps for optical interconnects in data centers [1] require significant improvements in various metrics. Within the span of a decade, it is proposed that the speed of optical links in the data centers increases by a factor of 25, the energy efficiency is improved by a factor of 5, and the optical switching speed reduces from 10 ms to 100 ps [1]. In order to realize the envisioned specifications, efforts have been incited to not only advance the high-speed optical devices such as modulators and photodetectors but also innovate the electronic circuit design for offering a superior Manuscript received October 8, 2018; revised January 5, 2019 and February 15, 2019; accepted February 21, 2019. Date of publication March 26, 2019; date of current version May 24, 2019. This paper was approved by Associate Editor Hui Pan. (Corresponding author: Kuan-Chang Chen.) The authors are with the California Institute of Technology, Pasadena, CA 91125 USA (e-mail: kcxchen@caltech.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2019.2902471 interface and better energy efficiency, e.g., [2]–[7]. In this paper, an optical receiver, which leverages the advancement of avalanche photodetector (APD) and new electronic circuit topologies for high sensitivity and fast reconfigurations, is presented. Despite the small modulation frequency-dependent loss introduced by the optical fibers, modulation frequency-independent signal attenuation and proportional losses (for multimode fiber, the loss is about 1.5 dB/km for 1300-nm signals; for single mode fiber, the loss is about 0.5 dB/km for 1310-nm signals) can be considerable in an optical network where long fibers and a large number of connectors, couplers, or splitters are involved. To overcome the attenuation and losses, the laser power needs to be augmented. With a given level of attenuation along the signal path, improvement in energy efficiency of optical links can be achieved with the availability of high-sensitivity receivers. Designing a high-sensitivity optical receiver using an APD along with the energy-efficient equalization techniques implemented in modern CMOS technology is one of the main goals of this paper. In a rapidly reconfigurable optical network, different data bursts originating from different transmitters can present distinct dc, amplitude, and phase characteristics, as illustrated in Fig. 1(a). A burst-mode receiver capable of performing reconfigurations to adapt itself to the variability, prior to the real data transmission, is essential. Fig. 1(b) shows a simplified timing diagram of the burst-mode reconfiguration scheme; the receiver needs to cancel the dc offset, control the signal amplitude for linear operations, and also recover the sampling clocks, before the transmission of the data payload. The aforementioned reconfigurations lead to an overhead time whenever a different data burst arrives, and consequently, the link latency and bandwidth can be improved by reducing the overhead, i.e., the overall reconfiguration time, especially for a network where switching events occur frequently. RC low-pass filter (LPF)-based designs are conventionally applied to extract the dc and amplitude information [4], [8], whereas the inevitable tradeoff between the tracking time and the settling behavior of RC LPF forms a bottleneck in reducing the reconfiguration time. Prior arts have employed various design techniques to improve the settling time. For instance, the work in [25] uses a feedback-type automatic offset compensation (AOC) Fig. 1. (a) Transmission of distinct data bursts originated from different transmitters to a single optical line terminal (OLT), including a burst-mode optical receiver (BMRX). (b) Simplified timing diagram of burst-mode reconfiguration scheme, in which the dc offset cancellation and amplitude control are the focus of this paper. loop with switchable bandwidth to remove the input dc offset in less than 75 ns for 10-Gb/s operations. A feed-forwardtype AOC is applied in [26] achieving 25.6-ns response time for 10-Gb/s operations with tradeoffs in accuracy and power consumption, as indicated in [25]. A calibration state machine is designed along with RC LPF in [4], which completes the search for the settings associated with dc component cancellation in 12.5 ns at 25-Gb/s operations. We propose an integrating dc comparator and an integrating amplitude comparator in this paper to enable fast cancellation of the dc offset, and rapid signal amplitude control, respectively. The proposed integrating dc and amplitude comparators eliminate the RC settling time constraints, and as will be shown in Sections V-B and V-C, the minimum comparison time is reduced to two unit intervals (UIs), empowering significant acceleration of the burst-mode reconfiguration and scaling with the data rate. Furthermore, due to the nature of performing integration, the proposed integrating dc and amplitude comparators do not require the clock and data recovery circuits (CDR) to be locked in advance. This paper is organized as follows. Section II reviews the basics of APD, its advantages, and challenges. Section III presents the overall APD-based receiver architecture. Section IV describes the equalization circuits designed in current-integrating fashion. Section V explains the operation of the burst-mode reconfiguration loops and elaborates the principles and implementations of the proposed integrating dc and amplitude comparators. The experimental results of this burst-mode optical receiver are shown in Section VI, and finally, Section VII summarizes this paper with performance comparisons and conclusions. ### II. AVALANCHE PHOTODETECTOR Friis' formula for noise figure [9] suggests that a high-gain stage at the front end is favorable in suppressing the noise contribution from succeeding stages to the overall signal-to-noise ratio (SNR). This motivates the use of APD since APD offers gain that increases the photocurrent by a multiplication factor of M as the very first stage of the receiver, and the ongoing advancements in the gain-bandwidth product of APD [10]–[12] have made APD more and more suitable for high-speed data communication. Nevertheless, since the APD gain arises from the generation of secondary electron–hole pairs through the impact ionization process, and these pairs are generated at random times [13], the shot noise of APD is enhanced by the excess noise factor, F, given by $$F = kM + (1 - k)(2 - 1/M) \tag{1}$$ where $k = \alpha_e/\alpha_h$ if $\alpha_h > \alpha_e$ , or $k = \alpha_h/\alpha_e$ if $\alpha_e > \alpha_h$ , by definition, while $\alpha_e$ and $\alpha_h$ denote the impact ionization coefficients for electrons and holes, respectively [13]. With the incident optical power represented by P, the dark current represented by $I_d$ , the magnitude of electron charge represented by Q, the responsivity of the photodetector represented by Q, the effective noise bandwidth of the receiver represented by Q, the thermal noise power represented by Q, the shotnoise power, denoted by Q, and the SNR of an APD-based front end can be, respectively, written as [13], [14] $$N_S = 2qM^2F(I_d + RP)\Delta f \tag{2}$$ $$SNR = (MRP)^2/(N_S + N_T).$$ (3) A few observations can be made from (1) to (3). First, when M is set to 1, F equals 1 in (1), implying the absence of excess shot noise, and the resulting expressions for (2) and (3) correspond to the case of using a p-i-n photodetector. Second, provided that the thermal noise is dominant over the shot noise; i.e., $N_T \gg N_S$ , the signal power increases with the gain (M) quadratically, and hence, the improvement in SNR by a factor of approximately $M^2$ can be achieved as long as the gainindependent thermal noise keeps dominating the noise contribution. Therefore, compared to a p-i-n photodetector with similar bandwidth, APD considerably benefits the receiver sensitivity in the thermal-noise-limited regime. On the contrary, in the case of being shot-noise limited, i.e., $N_S \gg N_T$ , it can be inferred from (2) and (3) that the SNR can no longer be improved by increasing M, and as a matter of fact, the SNR is degraded by the excess noise factor F, in comparison with a p-i-n photodetector having similar bandwidth. The foregoing suggests that there exists an optimum value of gain M, which gives rise to the maximum SNR; the optimum value of Mcan be found by solving (2) and (3). Fig. 2 shows the SNR improvements versus M with a given level of optical input power while different amounts of input-referred thermal noise $(I_{\rm NT})$ are present. In this design, the input-referred noise current of the receiver from the simulation is 0.68 $\mu$ A<sub>rms</sub>, and the overall responsivity of APD is set to be 4 A/W, corresponding to a multiplication factor or gain of 5.7 approximately. In addition to the enhanced shot noise, the bandwidth of APD generally decreases with the gain because of the longer avalanche build-up time [12]. As the effective signal power can be compromised by the excess intersymbol interference (ISI) due to the lower bandwidth, equalizer (EQ) circuits are included in this APD-based optical receiver for the purpose of ameliorating speed limitations formed by the APD gain-bandwidth tradeoff and the $R_{\rm in}\,C_{\rm in}$ time constants as well, Fig. 2. SNR improvements (in decibels) versus M with a given level of optical input power (-16 dBm). k = 0.2, R = 0.7, and different amounts of input-referred thermal noise are used in the computations. where $R_{\rm in}$ denotes the input resistance of the receiver, and $C_{\rm in}$ denotes the total capacitance at the receiver input. # III. APD-BASED OPTICAL RECEIVER ARCHITECTURE The architecture of the burst-mode optical receiver is shown in Fig. 3(a). The single-ended photocurrent is converted into differential voltage outputs by the analog front end (AFE), consisting of a variable current source (VCS) to subtract the dc component of the photocurrent, a three-stage inverter-based transimpedance amplifier (TIA), a differential pair-based single ended-to-differential amplifier (S2D), twostage current-steering variable gain amplifier (VGA), and a transconductance-C LPF (gm-C LPF) with 100-kHz bandwidth in a negative feedback loop for residual offset cancellation and combating low-frequency drifts. The circuit schematic of the VCS is shown in Fig. 3(b), where the value of $V_{\rm BIAS}$ and the ON/OFF states of the switches are determined by 8-bit digital setting (b0:b7). The 8-bit control of VCS is implemented in a binary-weighted fashion, and its tuning range can be adjusted by varying the tail current source of the V2I shown in Fig. 12(b). The idea of keeping the resolution (LSB) at 2%-4% of the peak-to-peak ac current amplitude, proposed in [4], is adopted in this VCS design. The circuit schematic of the three-stage inverter-based TIA is shown in Fig. 3(c), and the feedback resistors, $R_{F1}$ and $R_{F2}$ , are designed to be 1.2 k and 275 $\Omega$ , respectively. In view of that the value of $R_{\rm F1}$ impacts on the SNR performance and the EQ specifications, the design considerations of $R_{\rm F1}$ are described together with the EQ in Section IV, while the value of $R_{F2}$ is chosen such that the third-inverter stage with feedback resistor acts as an amplifier, and that RF2 does not considerably affect the overall AFE bandwidth. In addition, to better interface with the current-mode logic (CML) used in succeeding stages, the second-inverter stage in the TIA is sized so as to have the common-mode output voltage of TIA is ~635 mV under 1-V supply. A conventional differential amplifier is used to implement the S2D, as shown in Fig. 3(d). The S2D is designed to have voltage gain 1.5 V/V, output common-mode voltage $\sim$ 730 mV, and -3-dB bandwidth 28 GHz when loaded with the VGA in this paper. The circuit schematic of the VGA is shown in Fig. 3(e), in which $V_{\rm B0}$ is a fixed bias voltage, while $V_{\rm B1}$ and $V_{\rm B2}$ are determined by 5-bit digital setting (b8:b12) such that a fixed amount of current $I_{CM} = I_G + I_R$ , is steered between the branches with and without gain. The purpose of having a fixed value of I<sub>CM</sub> is to keep the common-mode output voltages the same, independent of the gain setting. With the current-steering tuning mechanism and without adjusting the values of the load resistors, the bandwidth can be kept sufficiently constant among all gain settings for the VGA. The tuning range of the VGA gain per stage is from 0.95 to 1.67 V/V, and the 5-bit control is implemented in thermometer code fashion. Specifically, for the two-stage VGA in this design, when the 5-bit digital setting steps from (0, 0, 0, 0, 0), $(0, 0, 0, 0, 1), (0, 0, 0, 1, 1), \ldots, (1, 1, 1, 1, 1),$ the gain of the two-stage VGA is increased by a factor of 1.25 per step with the -3-dB bandwidth of the two-stage VGA kept at $\sim$ 20 GHz. From simulations, the 1-, 2-, and 3-dB compression points in a gain of each VGA stage are 227, 310, and 368 mV, respectively. The enable/disable control scheme for the LPF loop is shown in Fig. 3(f). When EN<sub>LPF</sub> is set to logical low and ENB<sub>LPF</sub> is set to logical high, the LPF loop is disabled by having $V_{\text{NLPF}} \approx V_{\text{PLPF}}$ , introducing approximately zero offset to the AFE. The output of AFE is deserialized (1-to-4) by a bank of four sample-and-hold (S/H) switches, clocked by four quarter-rate clock phases. The S/H switch is implemented with a single transistor (pMOS) with a dummy transistor in series to mitigate the effects of charge injection as in [16]. Followed by a dedicated set of EQ and slicer, also clocked by the quarterrate clock phases, each deserialized voltage sample is recovered to digital logic level. When a new data burst arrives with a "1010..." preamble pattern, the on-chip searching logic is designed to sequentially determine the optimum digital setting of (b0:b12) with respect to two goals. One is to cancel the dc offset and in the meantime retain the dc bias point by matching the dc component of the photocurrent with the current from VCS. The other is to control the signal amplitude by adjusting the gain of VGA, in order that linear operation is maintained and the setting of the EQ circuits does not need to be updated with different data bursts possessing distinct power levels. ## IV. EQUALIZER DESIGN Increasing the value of the shunt-feedback resistor used in the TIA benefits in higher gain and lower noise at the receiver front end at the expense of eventually pushing the dominant pole toward low frequency, particularly with the presence of the capacitance from APD and wire-bond pad. When the frequency of the dominant pole is significantly smaller than the data rate, the long-tail post-cursor ISI is induced in the pulse response. The signal and noise analysis of a TIA front end employing an inverter with a shunt-feedback resistor, and the effects of varying the shunt-feedback resistor value on the TIA bandwidth have been studied in [21]. With the aim of optimizing the receiver sensitivity, in this paper, the shunt-feedback resistor [ $R_{\rm F1}$ in Fig. 3(c)] is increased to the extent that the ISI can be effectively cancelled or mitigated by the succeeding EQ. With $R_{\rm F1}$ designed to be 1.2 k $\Omega$ , the three-stage TIA achieves 67.16-dB $\Omega$ dc gain, 7.4 GHz, Fig. 3. (a) Architecture of the BMRX. (b) Circuit schematic of the VCS. (c) Circuit schematic of the three-stage inverter-based TIA. $R_{F1}=1.2~\mathrm{k}\Omega$ and $R_{F2}=275~\Omega$ nominally in this design. (d) Circuit schematic of the single-ended-to-differential amplifier (S2D), with load resistors set to 172 $\Omega$ in this design. (e) Circuit schematic of the current-steering VGA, with load resistors set to 172 $\Omega$ in this design. (f) Circuit schematic of the enable/disable control scheme for the LPF loop. -3-dB bandwidth, and the resultant -3-dB bandwidth of the AFE is 6 GHz from the simulation. The pulse response at the AFE outputs with -16-dBm optical modulation amplitude (OMA) input is simulated to determine the equalization scheme and the EQ coefficients, as shown in Fig. 4(a), where the peak value is $\sim 253$ mV. In this paper, an EQ performing two-tap (including the main cursor) feed-forward equalization (FFE) and two-tap decision feedback equalization (DFE) in current-integrating fashion is designed such that the long-tail ISI can be mostly removed by the two-tap FFE, while the residual first and second post-cursor ISI are cancelled by the two-tap DFE, as illustrated in Fig. 4(b). Although FFE amplifies high-frequency noise, the sensitivity can be improved when the benefit arising from reducing the ISI by FFE surpasses the penalty of the enhanced noise. The pulse responses at the AFE outputs are also simulated with different input power levels in the range from -16- to -11-dBm (OMA), along with their corresponding gain settings of VGA Fig. 4. (a) Pulse responses at the AFE outputs before applying equalizations. (b) Pulse responses at the AFE outputs after applying ideal two-tap FFE. to verify the following inequality is satisfied: $$V_{\text{Main}} - \Sigma_k |\text{ISI}_k| > 7 \times (V_{\text{Noise}}) + 30 \text{ mV}$$ (4) where $V_{\text{Main}}$ denotes the main cursor magnitude; $\text{ISI}_k$ denotes the residual ISI that is k UIs apart from the main cursor; the factor, 7, refers to the target bit-error-rate (BER) $<10^{-12}$ , and 30 mV is left as the decision margin for the data slicers. The double-sampling technique, reported and analyzed in [15] and [16], serves as one form of implementing two-tap FFE in the discrete-time domain. It takes two signal samples spaced with one UI and sums up the two samples with appropriate weights. As described in [16], the double-sampling technique is effective in equalizing a channel that well resembles a first-order *RC* low-pass system since the long-tail ISI can be cancelled by having the following satisfied: $$\beta_{\rm DS} = 1 - \exp(-T_b/T_{\rm RC}) \tag{5}$$ in which $T_b$ is the bit interval, $T_{RC}$ is the RC time constant, and $(\beta_{DS} - 1)$ is the ratio of the summing coefficients of the previous sample to that of the current sample. In addition, the double-sampling technique is energy efficient in comparison to both an infinite impulse response DFE (DFE-IIR) and an analog FFE by virtue of the dispensability of the output multiplexer after deserialization as well as the implementation of analog delay elements. In this design, the resistively loaded summer in [16] is replaced with a current-integrating summer to improve the settling time, and another DFE tap (second-tap DFE) is included. The schematic of the EQ is shown in Fig. 5, consisting of a current-integrating summer connected to the two-stage regenerative slicer embedding the first-tap DFE. The clock phases are designed for quarter rate operations, similar to [23], and such that $SUM_P[n]$ and $SUM_N[n]$ nodes shown in Fig. 5 are precharged to the supply voltage prior to the current integration over a single UI. At the end of the integration phase, the differential output voltage $(SUM_P[n] - SUM_N[n])$ is the weighted sum or the equalized value, as the result of performing two-tap FFE together with the second tap DFE. Specifically $$SUM_{P}[n] - SUM_{N}[n]$$ $$= \alpha \times (V_{P}[n] - V_{N}[n])$$ $$+ \beta \times (V_{P}[n-1] - V_{N}[n-1])$$ $$+ \gamma \times (D_{P}[n-2] - D_{N}[n-2])$$ (6) where $V_P[n]$ and $V_N[n]$ are the differential S/H outputs of the current sample; $V_P[n-1]$ and $V_N[n-1]$ are the differential S/H outputs of the previous sample spaced with one UI ahead; $D_P[n-2]$ and $D_N[n-2]$ are the recovered complimentary digital data bits two UIs ahead; $\alpha$ and $\beta$ are the FFE coefficients; and $\gamma$ is the coefficient for the second tap DFE. The FFE and second tap DFE coefficients, $\alpha$ , $\beta$ , and $\gamma$ , are adjusted by varying the gate voltages of the cascoding transistors $V_{DSM}$ , $V_{\rm DSS}$ , and $V_{\rm DFE2}$ , respectively, in Fig. 5, as in [24]. Similarly, $D_P[n-1]$ and $D_N[n-1]$ are the recovered complimentary digital data bits one UI ahead, and the first tap DFE coefficient, $\delta$ , is adjustable by varying $V_{\text{DFE}}$ . The gate voltages are set by voltage digital-to-analog converters (VDACs), and the resultant tap weight ranges of the FFE and DFE (i.e., $\beta/\alpha$ , $\gamma/\alpha$ , and $\delta/\alpha$ ) can be set from 0 to 0.8, with 0.025 resolution. The nonlinearity of the integrating summer increases with the differential input signal level. From simulations, the error is increased to $\sim 10\%$ of the ideal sum, when the differential input levels (i.e., $V_P[n] - V_N[n]$ and $V_P[n-1] - V_N[n-1]$ ) are increased to 330 mV. When the input levels are further increased to 400, 450, and 500 mV, the error is increased to 15.5%, 19.2%, and 23.4%, respectively. The limited accuracy of the integrating summer does have the negative effects on implementing precise equalization; however, the employed EQ design allows the SNR target shown as (4) to be fulfilled within the target dynamic range. As the first tap DFE is embedded in the two-stage regenerative slicer, the cancellation of the first post-cursor ISI is carried out at the internal nodes of the slicer, $V_{\rm EOP}$ and $V_{\rm EON}$ , labeled in Fig. 5. In this design, the direct feedbacks used in [17] are employed. The settled outputs of one regenerative latch are directly fed as inputs to two other EOs for two-tap DFE operation, and the loop-unrolling DFEs are not required by exploiting the overlaps of the evaluation phases of the two adjacent slicers. # V. Burst-Mode Reconfiguration Loops The block diagram of the burst-mode reconfiguration loops is shown in Fig. 6. During the preamble phase, the reconfiguration is started with an external pulse signal (PUL\_IN) and is finished in 14 reconfiguration clock (RCK) cycles. The on-chip search algorithm applies successive approximation register (SAR) logic, with each clock cycle dedicated to the sequential decision of 1 bit of digital setting, and one additional cycle inserted between those devoted to b7 and b8. The inserted cycle allows reliable dc offset cancellation before the search for the gain setting since the gm-C LPF is enabled to cancel the residual offset at the completion of setting b7. With the enable/disable control scheme shown in Fig. 3(f), the capacitors in the effect memorize nothing related to the results of the VCS loop as $V_{\rm NLPF} \approx V_{\rm PLPF}$ throughout the time, when the LPF is disabled. Accordingly, as soon as the LPF is enabled, Fig. 5. Schematic of the EQ performing double-sampling and two-tap DFE. Fig. 6. Block diagram of the burst-mode reconfiguration loops. it starts to help with cancelling the residual offset. Similar to other applications of SAR algorithm, e.g., SAR analog-todigital converter (ADC), the SAR algorithm applied in this paper relies on comparators to resolve 1 bit of digital setting, and the maximum speed at which the SAR algorithm can run depends on the delay within the loop. Therefore, integrating dc comparator and integrating amplitude comparator are proposed to reduce the minimum comparison time to two UIs, such that the loop delay is no longer limited by the RC settling time of conventional RC LPF-based designs. When the preamble data stream is present, the integrating dc comparator compares the dc levels of the AFE outputs, whereas the integrating comparator compares the signal amplitude with reference amplitude. The results are amplified to a digital logic level by the slicers following the comparators, and the VCS or VGA is accordingly adjusted, depending on which reconfiguration loop is on duty. The slicer follows the topology of the double-tail latch-type voltage sense amplifier, proposed in [22], and the slicers in the reconfiguration loops are designed with the specifications as follows. The input-referred noise is 0.33 mV<sub>rms</sub>; the sensitivity at 6.25-GHz operation is better than 100 $\mu$ V for input common-mode voltages varying from 0.4 to 0.7 V; and the offset is 5 mV from Monte Carlo simulations and can be effectively calibrated by introducing the offset into the Fig. 7. Block diagram of the pulse-triggered state machine. preceding integrating dc or amplitude comparator. Sections V-A–V-C first describe a customized state machine as part of the SAR search algorithm and elaborate the functions and advantages of proposed integrating dc and amplitude comparators which have critical contributions to improve the reconfiguration loop delays and, hence, the link bandwidth as well as latency in burst-mode operations. ## A. Pulse-Triggered State Machine The pulse-triggered state machine is designed for high-speed operation with the goal that each bit of the digital setting (b0:b12) does not react to the slicers in the reconfiguration loops until the corresponding pulse arrives. Additional function with enable/disable logic is implemented, offering options to either use the predefined setting set by an external field-programmable gate array or the setting determined by the reconfiguration loops. Fig. 7 shows the block diagram of Fig. 8. (a) Conventional *RC* LPF-based dc comparator. (b) Simulation results showing the tradeoff between tracking time and settling behavior. the pulsed-triggered state machine. Setting the enable signal (REN) to logical low disables the reconfiguration loops, and the predefined digital setting will be used throughout. Setting REN to logical high enables the reconfiguration loops, and a chain of nonoverlapping pulses spaced with one RCK cycle $(T_{RCK})$ is generated, selecting the bit to be overwritten by the slicer, one after another. In other words, as REN is set to be logical high, the predefined values of the digital setting are to be sequentially overwritten. For instance, with REN set to high, b0 keeps its predefined value when its corresponding digital control signal, PUL0, is initially low. When PUL0 rises to high, the register of b0 starts to take in the slicer output. Before PUL0 goes back to low, the regenerative slicer settles and overwrites the original predefined value of b0. This value written by the slicer is held afterward, unless the predefined value is reloaded by setting REN to low. By design, the rising edges of the pulses are aligned with those of the RCK, and the misalignment induced by process variations can be compensated with an on-chip digitally controlled delay line. ## B. Integrating DC Comparator Conventional first-order RC LPFs are commonly applied to extract dc information. As shown in Fig. 8(a), the slicer directly compares the LPF voltage levels and amplifies the difference to digital logic level. The result is then taken as 1 bit of the digital setting for VCS during the reconfiguration process. Nonetheless, as illustrated in Fig. 8(b), there is an inevitable tradeoff between the tracking time and settling behavior. With the RC time constant set to be 0.1 ns, as shown in blue, it can be observed that considerable ripples, which make the comparison result less reliable, are introduced. In contrast, with the RC time constant set to be 1 ns, as shown in red, it fails to track the dc component in 1.5 ns. This RC settling time constraint presents a bottleneck in speeding up the SAR logic, and thus the burst-mode reconfiguration since the unsettled voltage levels do not accurately reflect the effect of the last adjustment of VCS. As a consequence, comparing the unsettled voltage levels can lead to the nonoptimal setting of the VCS at the end of the reconfiguration process. The integrating dc comparator, as shown in Fig. 9(a), is proposed to replace the RC LPF. The pMOS pair charges the outputs to the supply voltage when the RCK is low, resetting the differential output voltage approximately zero. When RCK becomes high, the integration of the respective input voltage is effectively performed as the summation of the discharging current on the Fig. 9. (a) Circuit schematic of the proposed integrating dc comparator. (b) Simulation results showing the operation of the proposed integrating dc comparator, where the dc level of $V_{\rm IN}$ is lower than that of $V_{\rm IP}$ by 20 mV. (c) Integrating dc comparator differential output voltage versus different clock duty cycles with four distinct dc-level differences. load capacitance ( $C_{\rm LOAD}$ ); i.e., the voltage drop at the output. Since the input waveform is programmed to have "1010..." preamble pattern, the voltage drop at the output contains the information of the input dc level with the integration period set to even numbers of UIs. The simulation result, as shown in Fig. 9(b), illustrates the principle of operation. With the integration period (half of the RCK period in this design) set to two UIs and proper common-mode design, the polarity of the differential output voltage indicates which input has higher dc level at the end of the integration period. In addition, it is insensitive to the timing alignment between the RCK and the preamble data stream due to the nature of performing integration, and therefore, the locking of CDR in advance is unnecessary. The slicer following the integrating dc comparator further amplifies the differential output voltage to digital levels, overwriting 1 bit of digital setting to adjust the current of VCS. The proposed integrating dc comparator eliminates the RC settling time constraint and the minimum integration time; namely, the minimum comparison time can be set to be two UIs by integrating only one pair of 1 and 0. To make the fast dc offset cancellation loop more precise, the offset from the integrating dc comparator itself can be calibrated by adjusting the gate voltages of the cascoding transistors ( $V_{\rm OSP}$ and $V_{\rm OSN}$ ) with VDACs. As other currentintegrating designs, the common-mode integration could cause problems, if the common-mode voltage drops at the outputs are undesirably large such that the transconductance (gm) of the input pairs becomes significantly smaller as the integration carries out. To avoid the aforementioned issue, the commonmode output voltages are designed in order that 150 mV is left as the margin for the input pairs from being out of the saturation region. In addition, the tail bias current can be varied by adjusting its gate voltage $V_{\rm BIAS}$ . Finally, the effects of non-50% duty cycle clocks on the integration results are simulated, as shown in Fig. 9(c), suggesting that $\pm 10\%$ of duty cycle distortion does not have a significant impact on the calibration accuracy on account of the invariant polarity or sign of the integration results. ## C. Integrating Amplitude Comparator An automatic gain control (AGC) loop needs the information of signal amplitude in order to adjust the gain along the signal path. This purpose is conventionally implemented by using RC LPF-based peak detectors, e.g., [8], to measure the value or the level-shifted value of the peak amplitude. Similar to the first-order RC LPF, as described previously, the inevitable tradeoff between tracking time and settling behavior limits the reconfiguration speed, as the next adjustment of the gain setting may not be correctly resolved if the peak detectors are not settled. In this paper, the integrating amplitude comparator is proposed to replace the conventional peak detectors in the AGC loop and to enable rapid signal amplitude control along with the SAR search algorithm. The circuit schematic of the building block in the proposed integrating comparator is shown in Fig. 10(a), while its principle of operation is illustrated in Fig. 10(b). With the same RCK used in the integrating dc comparator, the outputs are precharged to the supply voltage when RCK is low, and the differential output voltage is, thus, reset to approximately zero prior to the rise of RCK. During the integration phase, i.e., when RCK is high, $V_{\rm OP}$ and $V_{\rm ON}$ are both being discharged, with a potentially equal or very different amount, Fig. 10. (a) Building block of the proposed integrating amplitude comparator. (b) Simulation results showing the operation of the building block in the proposed integrating amplitude comparator. depending on the differential input amplitude. In Fig. 10(b), provided that the mismatches introduced by the process variations are negligible or calibrated, $I_1 = I_2 = I_3 = I_4$ when $V_{\rm IP}=V_{\rm IN}$ by symmetry, and consequently, as shown in blue, the zero differential input amplitude leads to zero differential output voltage $(V_{\rm OP} - V_{\rm ON} \approx 0)$ , at the end of integration. By contrast, in the case that the differential input amplitude is large, as shown in red in Fig. 10(b), $I_1$ conducts most of the tail bias current during the half preamble period (one UI) when $V_{\rm IP} > V_{\rm IN}$ , while $I_3$ conducts most of the tail bias current during the other half preamble period when $V_{\rm IP}$ < $V_{\rm IN}$ . Since both $I_1$ and $I_3$ discharge the same node $V_{\rm ON}$ , a relatively large differential output voltage $(V_{\rm OP}-V_{\rm ON})$ , after the integration over one full preamble period (two UIs), is expected, due to the significantly more voltage drop at $V_{\rm ON}$ . The biasing and the sizes of the differential pairs are further optimized in order that the value of $(V_{OP} - V_{ON})$ at the end of the integration phase increases with the differential Fig. 11. (a) Circuit schematics of the proposed integrating amplitude comparator. (b) Integrating amplitude comparator differential output voltage versus different clock duty cycles with four distinct amplitude differences. amplitude of the input, regardless of the timing alignment between the input preamble waveform and the RCK. As shown in Fig. 11(a), a replica stage is connected to the outputs with opposite polarity, converting the differential amplitude of its input into the value of $(V_{ON} - V_{OP})$ instead, at the end of the integration phase. Therefore, one stage will compete with the other during the integration phase in deciding the sign of $(V_{\rm OP} - V_{\rm ON})$ , and the result directly indicates which stage sees the input signal with larger differential amplitude, given that the common-mode voltages of the inputs are identical and the offsets are negligible. A reference preamble waveform possessing "1010..." pattern is derived from the rail-to-rail clock signals and its amplitude is programmable but fixed during the reconfiguration process. By comparing the preamble waveform from the AFE outputs with the reference preamble waveform, the proposed integrating amplitude comparator removes the need for peak detectors, offering much faster updates to the VGA gain setting. The rapid signal amplitude control is achieved by incorporating the proposed integrating amplitude comparator with the SAR search algorithm such that the amplitude of the AFE outputs converges toward the reference amplitude in a designed number of RCK cycles. In this paper, differential pair-based buffers are included at the inputs of the proposed integrating amplitude comparator to implement common-mode rejection, with the main benefit that the proposed gain reconfiguration loop is insensitive to the residual dc offset. Finally, similar to the case of integrating dc comparator, the effects of non-50% duty cycle clocks on the integration results are also simulated, as shown in Fig. 11(b), again suggesting that $\pm 10\%$ of duty cycle distortion does not have a significant impact on the calibration accuracy on account of the invariant polarity or sign of the integration results. # D. Analog Settling Time Reduction Despite the bottleneck formed by the RC settling time constraint in speeding up the reconfiguration loop is eliminated by the proposed integrating dc and amplitude comparators, the analog settling time still takes part in determining the maximum speed at which the SAR logic can operate. The analog settling time is destined, as the effects of updating the digital setting of VCS or VGA cannot be immediately settled and ready for the next point in the SAR search process. Even though the analog settling occurs concurrently, it is informative to identify the analog settling time as two parts. The first one resides in the AFE and strongly depends on the bandwidth of the AFE. One possible way to reduce the analog settling time of the AFE, which is not implemented in this paper, is adding switches to decrease the load resistance at each or selected stages during the reconfiguration process. For instance, the resistance of the shunt-feedback resistor in TIA or the load resistors in VGA can be effectively reduced by turning on the switches in parallel, when the reconfiguration is in progress. The drawback of the aforementioned method is that the increase in bandwidth by decreasing the load resistance generally implies the reduction in gain, and hence, the dc offset and signal amplitude are both expected to be smaller, compared with those in the case, when the parallel switches are absent. The other part of analog settling time is associated with the settling of the bias currents in VCS and in the currentsteering VGA. Fig. 12(a) shows the schematic of a conventional current-mirror-based current digital-to-analog converter (DAC), where the digital inputs steer the currents into or out from the current mirror at the output. This topology using a current mirror is suitable for high-speed operation, i.e., with short settling time, only if the current mirror conducts a relatively high current such that the diode-connected transistor acts as a resistor with relatively low resistance. Accordingly, the DAC with a current mirror can be used in reconfiguring the current-steering VGA in that the currents flowing through the current mirrors are expected to be within 1–3 mA. By contrast, the DAC using a current mirror should not be directly used in reconfiguring the VCS, in view of the fact that the target dc component of the photocurrent, which is to be subtracted with the current flowing through the VCS, is on the order of 100 $\mu$ A. The schematic of the proposed solution is shown in Fig. 12(b), where the DAC is loaded with resistors with low resistance, and the differential output voltage is then taken as the differential input of a voltage-to-current (V2I) converter. The resistively loaded DAC has a lower and invariable RC time constant, in contrast to the DAC loaded with current mirrors, benefiting the settling time whenever a new digital setting is applied. A mirroring ratio of 7:2 is used at the output current mirror of the V2I to further avoid relatively small bias current flowing into the node $V_{OUT}$ labeled in Fig. 12(b). The 95% settling time at $V_{OUT}$ is measured to Fig. 12. (a) Circuit schematic of the conventional DAC loaded with diodeconnected transistors for current mirroring. (b) Circuit schematic of the DAC loaded with low-resistance resistors. The differential output voltage is taken as the differential input of a voltage-to-current (V2I) converter. be 36.64 ps from the simulation. The V2I converter not only provides isolation of the output node from the bank of switches used to steer the currents but also allows the VCS to operate in different dynamic ranges by simply varying its tail bias current. Although nonlinearity can be introduced by the V2I converter, it is not an issue in this design where the resolution for the target dynamic range is sufficient and the convergence to the level closest to the ideal one is accomplished by the feedback loop of SAR search algorithm. # E. Simulation Result With the foregoing designs and optimizations, the RCK period for 25-Gb/s operation can be set to be four UIs, in which two UIs are dedicated to the integration phase, i.e., comparison time, while the other two UIs are devoted to resetting the integrating comparators and the settling time after an update to the digital setting is applied. A typical simulation result is shown in Fig. 13 for illustration. The outputs of the AFE are initially far away from each other because of the large dc offset. As the dc offset gets cancelled, Fig. 13. Simulated AFE outputs in burst-mode reconfiguration. Fig. 14. Block diagram of the experiment setup. they become closer to each other. Afterward, the amplitude starts to grow and remain at a desirable level. The whole burst-mode reconfiguration process takes a fixed number of RCK cycles, 14 cycles in this design, and thus finishes in $14 \times 4$ UIs. Specifically, with the pulse-triggered state machine described in Section V-A, the digital settings can only be sequentially overwritten or reconfigured within 14 clock cycles. After the 14 clock cycles, all settings cannot be further changed since their digital values are held and stored by latches until the next reconfiguration process. When a quarter-rate (6.25 GHz) clock is used for 25-Gb/s operations, the reconfiguration takes place within the time span of $14 \times 160$ ps = 2.24 ns. Similarly, if a 3.125-GHz (1/8 of the data rate) clock is used for 25-Gb/s operations, the reconfiguration is completed in $14 \times 320$ ps = 4.48 ns. #### VI. EXPERIMENTAL RESULTS The chip is fabricated in 28-nm CMOS technology. Fig. 18(a) shows the die micrograph of the core circuitry. The experiment setup is shown in Fig. 14. The receiver chip is wire-bonded to an APD die whose gain (*M*) at 1310 nm is adjustable via the reverse-bias voltage. A continuous-wave laser is modulated by a high-speed Mach–Zehnder modulator with PRBS-7 data pattern and coupled to the APD through a single-mode fiber. An oscilloscope for monitoring the input optical data signal and the output electrical data signal is set up, and an external BER tester is used to measure the BER. The best sensitivity, -16-dBm (OMA), is achieved at 25 Gb/s with PRBS-7 input pattern when the reverse-bias voltage of Fig. 15. Bathtub curve measured with -16-dBm OMA at 25 Gb/s. Fig. 16. Waterfall plot with fixed EQ setting at 25 Gb/s, with PRBS-7. Fig. 17. Waterfall plot with fixed EQ setting at 25 Gb/s, with PRBS-31. APD is set to be 16 V, under which the overall responsivity, including the multiplication factor, of the APD is 4 A/W, while the -3-dB bandwidth of the APD optical response, excluding the input resistance and capacitance of the electronic chip, is $\sim$ 20 GHz. The off-chip decoupling capacitors are included on the printed circuit board (PCB) to minimize the variation of the APD bias voltage. The measured bathtub curve with -16-dBm (OMA) at 25 Gb/s with PRBS-7 input pattern is shown in Fig. 15, showing 0.2 UI horizontal opening for BER less than $10^{-12}$ . In order to verify the function Fig. 18. (a) Micrograph of the core circuitry, including the pad wire-bonded to the APD (APDIN), AFE, quarter-rate EQ (EQ), integrating dc comparator (Int. dc Comp.), integrating amplitude comparator (Int. Amp. Comp.). (b) Power consumption breakdown of the receiver data path at 25 Gb/s. TABLE I PERFORMANCE SUMMARY AND COMPARISON | | This<br>work | JSSC'<br>2015<br>[4] | RFIC'<br>2014<br>[18] | ISSCC'<br>2017<br>[19] | VLSI'<br>2017<br>[20] | | |------------------------------|--------------|----------------------|-----------------------|------------------------|-----------------------|-------| | Technology | 28nm | 32nm<br>SOI | 28nm | 14nm<br>FinFET | 14nm<br>FinFET | | | Data Rate<br>(Gb/sec) | 25 | 25 | 25 | 32-64 | 25 | 32 | | Efficiency<br>(pJ/bit) | 1.37 | 4* | 0.17 | 1.4<br>@64G | 1.59 | 1.41 | | PD Capacitance<br>(fF) | 55 | 100 | 8 | 69 | 69 | | | PD Responsivity<br>(A/W) | 4 | 0.5 | 0.8 | 0.52 | 0.52 | | | Reconfiguration<br>Time (ns) | 2.24 | 12.5** | N/A | N/A | N/A | | | Sensitivity<br>(dBm) | -16 | -10.9 | -12.8*** | -13<br>@32G | -13.8 | -12.4 | <sup>\*</sup>Including clock and data recovery circuitry. \*\*The 12.5ns is fully dedicated to cancelling the DC component. \*\*\* Calculated with 6dB optical coupling loss. of the proposed integrating dc and amplitude comparators together with the reconfiguration loops, the waterfall plot for PRBS-7 input with fixed EQ setting found with -16-dBm input is shown in Fig. 16, and a dynamic range of 5 dB is achieved. Outside the dynamic range, the BER is improved when reducing the RCK frequency from quarter rate to oneeighth of the data rate. Similar to the SAR ADC designs, a single decision error during the SAR search process can lead to deviation from the optimum convergence point. The extra time granted by reducing the RCK frequency primarily helped the AFE to settle more completely, and by which the chance of having a decision error on account of the unsettled inputs is reduced. The limiting factor of the dynamic range in this paper lies in the current-steering VGA since each stage of the VGA is designed to have only 2.45-dB dynamic range of gain with its -3-dB bandwidth kept approximately constant. When tested with PRBS-31 input pattern, the best sensitivity measured at 25 Gb/s is degraded to -15.3-dBm (OMA), and the waterfall plot for PRBS-31 with fixed EQ setting found with -15.3-dBm input is shown in Fig. 17. Finally, the power consumption and the breakdown at 25 Gb/s are shown in Fig. 18(b). The AFE consumes 12.2 mW, including 1.2 mW by APD; the EQ consumes 4.3 mW, and the clock and data buffer consume 17.7 mW. In total, 34.2 mW is consumed by the receiver data-path, and 1.37-pJ/b energy efficiency is achieved. #### VII. CONCLUSION The APD-based burst-mode optical receiver applies current-integrating equalization and achieves —16-dBm (OMA) sensitivity at 25 Gb/s with 1.37-pJ/b energy efficiency. The proposed integrating dc comparator and integrating amplitude comparator significantly relax the settling time constraints, enabling 2.24-ns reconfiguration time at 25 Gb/s. The performance and comparisons with the state-of-the-art are summarized in Table I. #### ACKNOWLEDGMENT The authors would like to thank D. A. Nelson, K. Muth, A. Zilkie of Rockley Photonics for their help and support; Caltech MICS Lab members and alumni, A. Agarwal, M. Monge, M. Raj, S. Saeedi, for technical discussions; and Caltech CHIC Lab for sharing testing resources. #### REFERENCES - [1] C. DeCusatis, "Optical interconnect networks for data communications," J. Lightw. Technol., vol. 32, no. 4, pp. 544–552, Feb. 15, 2014. - [2] J. C. Campbell, "Recent advances in avalanche photodiodes," J. Lightw. Technol., vol. 34, no. 2, pp. 278–285, Jan. 15, 2016. - [3] X. Chen et al., "The emergence of silicon photonics as a flexible technology platform," Proc. IEEE, vol. 106, no. 12, pp. 2101–2116, Dec. 2018. - [4] A. Rylyakov et al., "A 25 Gb/s burst-mode receiver for low latency photonic switch networks," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3120–3132, Dec. 2015. - [5] M. G. Ahmed et al., "A 12-Gb/s -16.8-dBm OMA sensitivity 23-mW optical receiver in 65-nm CMOS," IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 445–457, Feb. 2018. - [6] M. Raj, M. Monge, and A. Emami, "A modelling and nonlinear equalization technique for a 20 Gb/s 0.77 pJ/b VCSEL transmitter in 32 nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1734–1743, Aug. 2016. - [7] A. Tyagi et al., "A 50 Gb/s PAM-4 VCSEL transmitter with 2.5-tap nonlinear equalization in 65-nm CMOS," *IEEE Photon. Technol. Lett.*, vol. 30, no. 13, pp. 1246–1249, Jul. 1, 2018. - [8] C.-F. Liao and S.-I. Liu, "40 Gb/s transimpedance-AGC amplifier and CDR circuit for broadband data receivers in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 3, pp. 642–655, Mar. 2008. - [9] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits. Cambridge, U.K.: Cambridge Univ. Press, 1998. - [10] Y. Kang et al., "Monolithic germanium/silicon avalanche photodiodes with 340 GHz gain-bandwidth product," *Nature Photon.*, vol. 3, pp. 59–63, Jan. 2009. - [11] M. Huang et al., "25Gb/s normal incident Ge/Si avalanche photodiode," in Proc. Eur. Conf. Opt. Commun. (ECOC), Cannes, France, Sep. 2014, pp. 1–3. - [12] M. Nada, Y. Yamada, and H. Matsuzaki, "Responsivity-bandwidth limit of avalanche photodiodes: Toward future ethernet systems," *IEEE J. Sel. Topics Quantum Electron.*, vol. 24, no. 2, Mar./Apr. 2018, Art no. 3800811. - [13] G. P. Agrawal, Lightwave Technology: Telecommunication Systems. New York, NY, USA: Wiley, 2005. - [14] R. J. McIntyre, "Multiplication noise in uniform avalanche diodes," IEEE Trans. Electron Devices, vol. ED-13, no. 1, pp. 164–168, Jan. 1966. - [15] M. H. Nazari and A. Emami-Neyestanak, "An 18.6 Gb/s double-sampling receiver in 65 nm CMOS for ultra-low-power optical communication," in *Proc. IEEE Int. Solid-State Circuits Conf.*, San Francisco, CA, USA, Feb. 2012, pp. 130–131. - [16] M. H. Nazari and A. Emami-Neyestanak, "A 24-Gb/s double-sampling receiver for ultra-low-power optical communication," *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 344–357, Feb. 2013. - [17] S. Son, H. Kim, M.-J. Park, K. H. Kim, and J. Kim, "A 2.3-mW, 5-Gb/s decision-feedback equalizing receiver front-end with static-power-free signal summation and CDR-based precursor ISI reduction," in *Proc. IEEE Asian Solid State Circuits Conf. (A-SSCC)*, Kobe, Japan, Nov. 2012, pp. 133–136. - [18] S. Saeedi and A. Emami, "A 25 Gb/s 170 μW/Gb/s optical receiver in 28 nm CMOS for chip-to-chip optical communication," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Tampa, FL, USA, Jun. 2014, pp. 283–286. - [19] A. Cevrero et al., "29.1 A 64 Gb/s 1.4 pJ/b NRZ optical-receiver data-path in 14 nm CMOS FinFET," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA, Feb. 2017, pp. 482–483. - [20] J. Proesel et al., "A 32 Gb/s, 4.7 pJ/bit optical link with -11.7 dBm sensitivity in 14-nm FinFET CMOS," in Proc. Symp. VLSI Circuits, Kyoto, Japan, 2017, pp. C318–C319. - [21] I. Ozkaya et al., "A 64-Gb/s 1.4-pJ/b NRZ optical receiver data-path in 14-nm CMOS FinFET," IEEE J. Solid-State Circuits, vol. 52, no. 12, pp. 3458–3473, Dec. 2017. - [22] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, and B. Nauta, "A double-tail latch-type voltage sense amplifier with 18ps setup+hold time," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2007, pp. 314–605. - [23] A. Roshan-Zamir, O. Elhadidy, H.-W. Yang, and S. Palermo, "A reconfigurable 16/32 Gb/s dual-mode NRZ/PAM4 SerDes in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 9, pp. 2430–2447, Sep. 2017. - [24] J. Han, Y. Lu, N. Sutardja, K. Jung, and E. Alon, "Design techniques for a 60 Gb/s 173 mW wireline receiver frontend in 65 nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 51, no. 4, pp. 871–880, Apr. 2016. - [25] X. Yin et al., "A 10 Gb/s burst-mode TIA with on-chip reset/lock CM signaling detection and limiting amplifier with a 75ns settling time," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA, Feb. 2012, pp. 416–418. - [26] T. D. Ridder et al., "10 Gbit/s burst-mode post-amplifier with automatic reset," Electron. Lett., vol. 44, no. 23, pp. 1371–1373, Nov. 2008. Kuan-Chang Chen (S'15) received the B.S. degree in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 2011, and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2014. He is currently pursuing the Ph.D. degree in electrical engineering with the California Institute of Technology (Caltech), Pasadena, CA, USA, with special emphasis on analog and mixed-signal circuits and systems. Mr. Chen was a recipient of the 2015 Henry Ford II Scholar Award at Caltech. **Azita Emami** (M'05–SM'17) received the B.S. degree from the Sharif University of Technology, Tehran, Iran, in 1996, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1999 and 2004, respectively. From 2004 to 2006, she was with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA. In 2007, she joined the California Institute of Technology (Caltech), Pasadena, CA, USA, where she is currently the Andrew and Peggy Cherng Pro- fessor of electrical engineering and medical engineering, and an Investigator with the Heritage Medical Research Institute, Bakersfield, CA, USA. She also serves as the Executive Officer (Department Head) for electrical engineering with Caltech. Her current research interests include integrated circuits and systems, integrated photonics, wearable and implantable devices for neural recording, neural stimulation, sensing, and drug delivery. She was an IEEE SSCS Distinguished Lecturer from 2017 to 2018. She serves as an Associate Editor for the IEEE JOURNAL OF SOLID STATE CIRCUITS (JSSC).