Azita Emami-Neyestanak, Dean Liu, Gordon Keeler, Noah Helman and Mark Horowitz Computer Systems Laboratory, Stanford University Stanford, CA 94305 ### Abstract A 1.6Gb/s receiver for optical communication has been designed and fabricated in a 0.25-μm CMOS process. This receiver has no transimpedance amplifier and uses the parasitic capacitor of the flip-chip bonded photodetector as an integrating element and resolves the data with a double-sampling technique. A simple feedback loop adjusts a bias current to the average optical signal, which essentially "AC couples" the input. The resulting receiver resolves an 11μA input, dissipates 3mW of power, occupies 80μm x 50μm of area and operates at over 1.6Gb/s. ### Introduction Using optics to interconnect integrated circuits has recently gained a lot of interest [1]. A potential design platform uses hybrid integration of arrays of optical multiple quantum well (MQW) modulators and detectors with commercial electronic circuits [2]-[5]. However a dense array of optical detectors requires very low-power, sensitive, and compact optical receivers [6]. Various designs for the input receiver have been used in smart pixel test systems, including simple FET inputs [7], diode-clamped receivers [8] and transimpedance amplifiers [9]. These designs rely on an analog front end amplifier providing either voltage gain, current gain or current to voltage conversion. But these amplifiers often dissipate large amount of quiescent power to achieve high-bandwidth and low noise. This paper describes the design and implementation of a novel CMOS receiver suitable for arrays of optoelectronic switching nodes comprised of flip-chip-bonded MQW modulators and detectors on silicon that eliminates the need for a linear amplification. Instead it integrates the input current on the parasitic capacitance of the detector, and uses double-sampling to create the voltage difference for a clocked comparator to resolve. # Receiver Design One significant parameter of hybrid flip-chip bonded MQW detector is its capacitance. The diode capacitance and the flip-chip bump-plus-pad capacitance are the two primary components of the detector capacitor, $C_p$ . This capacitor can integrate the optically generated current of the detector over time. If the input of the front end receiver is also capacitive with a capacitance $C_{\rm in}$ , the voltage of the input node at each signal time, $V_n$ , is always a sum of the incoming signal and the voltage of the input node just before that signal; $V_{n-1}$ . $V_n = V_{n-1} + (I_{op} \cdot T)/(C_p + C_{in})$ . Therefore if we compare $V_n$ and $V_{n-1}$ , we have enough information about the input signal at time $t_n$ to determine whether it was a one or a zero. Implementing a receiver based on this idea requires solving four main issues: sampling and storing an analog voltage, fast comparison, subtracting the average input current, and generating the clock signals for this system. Each of these issues is described in more detail in the following sections of the paper. Fig. 1 Receiver block diagram Fig. 1 illustrates the block diagram of the designed receiver. The input signal from the photo detector is a single-ended, positive current. The injected charge is higher if the bit value is "1" but it's not necessarily zero when the bit value is "0". Therefore, in order to have a bipolar voltage change at the input of receiver we need to subtract a constant charge for every bit from the input capacitor. This is done by subtracting an adjustable current from the input. The DC current is adjusted by a feedback loop looking at the DC value of the voltage of input node. The feedback loop not only adjusts the DC current but also sets the average voltage of input node. Bipolar voltage change at the input allows us to decide the input value by comparing two adjacent samples of the input voltage. If the new sample is higher, the input signal is "1", otherwise, it is "0". Fig. 2 illustrates how Vin varies with time when IDC is set to correct value, assuming constant currents during each bit period. Fig. 2 Voltage of input node when I<sub>DC</sub> is correct # A. Sampler: The analog sampler is illustrated in Fig. 3. It uses two non overlapping phases, $\phi 1$ and $\phi 2$ derived from a single 50% duty cycle clock. Therefore sampling is done at both rising and falling edges of reference clock. Generation and shaping of these phases will be discussed later. When $\phi_1$ is high, $V_1$ is going to be the new sample and $V_2$ is the old sample of $V_{in}$ . After $\phi_1$ falls and before $\phi_2$ starts, $V_1$ and $V_2$ are compared. Only comparing the samples when both clocks are low balances the clock feedthrough noise. Thus any charge injection through the switching transistors are common mode. During the next phase, when $\phi_2$ is high, $V_1$ is held unchanged and $V_2$ is updated. Now $V_2$ is the new sample and $V_1$ is the old one and they are compared as soon as $\phi_2$ goes low. The RC delay of the samplers puts a lower limit on duty cycle of $\phi_1$ and $\phi_2$ . For $1\mu m$ wide sample devices driving $10 \, \mathrm{fF}$ loads, we need about $200 \, \mathrm{ps}$ , or 35 % of a bit time at $1.6 \, \mathrm{Gb/s}$ . The two hold capacitors are much smaller than the parasitic capacitor of the detector. Therefore charge sharing does not attenuate the signal significantly. ## B. Comparator: Comparison is done by two StrongArm [10], regenerative sense amplifiers. Each of them is triggered immediately after one of the two phases, $\phi_1$ or $\phi_2$ falls, and before the other one rises. The sizing of transistors in the sense amp is critical since these structure dissipate most of the power in the interface. These circuits use offset compensation to break the dependence of offset voltage on transistor size, allowing 5µm wide input devices. Offset compensation is done by digitally adjusting the number of small capacitors added to the internal nodes A and B [11] in Fig. 4. Any process mismatches between the two branches of the sense amp and also mismatches between the two branches of the sampler or between $\phi_1$ and $\phi_2$ can be compensated at this stage. Simulation results show that the offset can be corrected with steps of about 6mV. Fig. 4 Offset compensated Sense Amp. This design is inherently robust against kick-back and charge injection from the sense amps to the high impedance input nodes. The reason is that there are two similar sense amps that their inputs are connected to the same nodes of one sampler unit. Soon after one sense amp injects some charge to these nodes during the evaluation phase, the other sense amp is reseted and injects the opposite charge to the same nodes. The total injected charge is zero after one bit period and the sample is valid for the next comparison. The key point here is that the shape of voltages of precharged nodes, A and B are always very similar and therefore kick-back is not significantly data dependent. ## C. Filter and Current Feedback: Assuming that the stream of incoming data is DC balanced, the DC voltage of input node remains constant if I<sub>DC</sub>, in Fig.1, is equal to $(I_0 + I_1)/2$ , where $I_0$ is the average optically generated current during a "0" bit and I<sub>1</sub> is the average optically generated current during a "1" bit (I<sub>0</sub> and I<sub>1</sub> can vary due to variation of optical input power and characteristic of the photodetector). If I<sub>DC</sub> is any other value, the DC value of V<sub>in</sub>will increase or decrease even after equal numbers of "0"s and "1"s. Therefore a feedback loop can be used to adjust I<sub>DC</sub> by looking at Vin. The key is to use a low-pass filtered version of Vin to ensure the current does not fluctuate in response to the high frequency changes of Vin due to the incoming data. For instance, if we assume that data is DC balanced within 20 bits, I<sub>DC</sub> should be fairly constant even if we receive a row of 10 consequent ones or 10 consequent zeros. The filter should also have a relatively high DC gain to be able to handle wide range of Io and I1 values while keeping Vin relatively constant, at the best point of operation for the sampler and the comparators. The simplest approach to build the needed low-pass filter is a single pole RC circuit, but because of the parasitic capacitor of the detector, the open loop transfer function of this simple system will have two poles and it will cause stability problem in a feedback loop (Fig. 5). Fig. 5 Feedback loop for current adjustment One way to make the loop stable and increase the phase margin is adding a zero to the loop transfer function. Capacitor $C_z$ in Fig. 5 is added to the circuit for this reason. The open loop transfer function is: $$G(s) = K \cdot \frac{R \cdot C_z \cdot s + 1}{C_p \cdot s \cdot (R \cdot (C + C_z) \cdot s + 1)}$$ Fig. 6 illustrates the transistor-level schematic of the buffer, filter and current source. Resistor R is implemented by a switched capacitor, $R = 1/(f \cdot C_r)$ . Where f is the frequency of non-overlapping clocks, clk and clk\_b. Fig. 6 Loop filter schematic The DC value of $V_{in}$ can be externally set by $V_{set}$ , $V_{in} \cong V_{set} + V_{GS(nmos)}$ . The differential pair quiescent current should be enough to cover a wide range of $I_{DC}$ and it can be chosen by Bias1. Finally the input signal is buffered by a source follower. #### D. Phase Generator: $\phi_1$ and $\phi_2$ are two non-overlapping phases with the same frequency as reference clk and are used for sampling. $L_1$ and $L_2$ are the control phases of Sense Amp1 and Sense Amp2. Fig. 7 illustrates how $\phi_1$ , $\phi_2$ , $L_1$ and $L_2$ are generated from the reference clock and inverted reference clock. Clk\_b is generated carefully with same rising and falling rates as Clk and with low skew. Fig. 7 Phase generator $\phi_1$ and $\phi_2$ are in fact chopped versions of clk and clk\_b and their duty cycle can be adjusted with digitally controlled capacitors, $C_{adj}$ . The rising edge of $L_1$ is delayed by one inverter, therefore right after sampling is done by $\phi_1$ , Sense Ampl starts to evaluate its inputs. The evaluation should be done before the rising edge of $\phi_2$ . This condition is met because $\phi_2$ 's duty cycle is less than 50%. As mentioned before the minimum width of $\phi_1$ or $\phi_2$ is set by the acquisition time of the sample switch, the non-overlapping region in this design is about 100psec for 1.6 Gbps data rate. Having $L_1$ and $L_2$ almost at the middle of this region, a skew of about 50psec between Clk and Clk\_b can be tolerated by this receiver. ### **Support and Test Circuits** To avoid hysteresis and to increase the sensitivity and speed of comparison a small latch follows each of the first-stage sense amps. The output of the latches are negative true pulses, which are converted into levels using a dynamic SR latch. For this test-chip, reference clock is generated by an integrated dual loop Delay Locked Loop [12]. The test chip does not contain the clock recovery circuitry, so the multiplexer and interpolator are digitally controlled by programming the chip externally to correct the phase. ## **Experiment Results** This design was fabricated in a 0.25 µm CMOS process and tested with a 2.5V power supply. The arrays of GaAlAs pi-n diodes were connected on top of the silicon chip with the flip-chip bonding technique, Fig. 8. The total input capacitance after adding the photo devices measured by sending a periodic pattern of long sequences of ones and zeros. A small sampler at the input node gives the voltage values at the beginning and end of each sequence. If laser's driving current is adjusted to have zero optical output for a zero bit, then by reading IDC we can calculate Cin. Our measurement gives a total capacitance of 420fF. In the next step we measured the sensitivity of the receiver at the maximum possible bit rate. The bit rate is limited by the non-overlapping margin needed between $\phi_1$ and $\phi_2$ . More relaxed timing and/or higher performance can be achieved by using 4 samplers, and a 1 to 4 input demultiplexing scheme. Fig. 8 Receiver micrograph after bonding arrays of photo diodes If data is DC balanced over every N bits, increasing N causes a small reduction in sensitivity of the receiver. This is because of the changes in $I_{DC}$ due to the limited cut-off frequency of the low-pass filter and changes in common-mode range of the input. The worse case pattern to measure the sen- sitivity is when N/2 zeros are followed by N/2 ones. For N=16 the receiver required an average current of $\Delta I_{ave} = (I_1 - I_0)/2 = 5.5 \mu A$ for a 1.6 Gb/s data rate which corresponds to about 8mV voltage swing per bit. This current increases to 9 $\mu$ A when the sequence is extended to N=32. For N=16 no errors were found at the minimum power level for more than $10^8$ bits. Our optical test setup did not allow us to measure the BER for pseudo random data. Fig. 9 Input Voltage and Recovered Data The responsivity of MQW detector is about 0.5 A/W, therefore the system can detect an optical switching energy as low as 14fJ. Total power dissipation of the whole receiver circuitry is less than 3mW at 1.6Gb/s and is mostly due to the clocking and dynamic dissipation of the sense amps. The area of the receiver is $80\mu m \times 50\mu m$ in our $0.25\mu m$ CMOS process. The performance of this receiver is summarized in Table 1. ## Conclusion We demonstrate that one does not need a transimpedance amplifier in a high-speed optical link. One can get good sensitivity using a double-sampled approach. The receiver is designed for a 0.25-µm CMOS process and arrays of hybrid flip-chip bonded MQW detectors, provides high sensitivity and bandwidth, while requiring small amounts of power and area. The sensitivity of this receiver is more than adequate for short-haul optical communications (should improve once the input capacitance is reduced) and the required area and power will allow 1000 receivers to consume only 3W and 4mm<sup>2</sup>. Table I: Receiver Summary | Supply Voltage | 2.5 V | |------------------------|------------------------| | Technology | National 0.25µm CMOS | | Capacitance | 420 fF | | Sensitivity @ 1.6 Gb/s | 11 μA (switch current) | | Input data Rate | 1.6 Gb/s | | Power Dissipation | 3 mW | | Area | 80μm x 50μm | ### Acknowledgments The authors would like to thank David A. B. Miller, Diwakar Agarwal, Samuel Palermo, Timothy J. Drabik, Jaeseo Lee, Vladimir Stojanovic, Henrik Johansson for technical discussions and National Semiconductor for fabricating the test chip. #### References - David A. B. Miller, "Physical reasons for optical interconnection", *International Journal of Optoelectronics*, vol. 11, no. 3, pp. 155-168, 1997 - [2] K. W. Gossen, J. A. Walker, L. A. D'Asaro, S. P. Hui, B. Tseng, R. Leibenguth, D. Kossives, D. D. Bacon, D. Dahringer, L. M. F. Chirovsky, L. A. Lentine, and D. A. B. Miller, "GaAs MQW modulators intergrated with silicon CMOS", *IEEE Photon. Tech*nol. Lett., vol. 7, no. 4, pp. 360-362, Apr. 1995 - [3] A. L. Lentine and D. A. B. Miller, "Evolution of the SEED technology: Bistable logic gates to optoelectronic smart pixels", *IEEE J. Quantum Electronics*, vol. 29, pp. 655-669, Feb. 1993 - [4] Ashok V. Krishnamoorthy, David A. B. Miller, "Scaling Optoelectronic-VLSI Circuits into 21st Century: A Technology Roadmap", *IEEE Journal of Selected Topics in Quantum Electronics*, vol. 2, no. 1, pp. 55-76, Apr. 1996 - [5] A. L. Lentine, et al, "Arrays of Optoelectronic Switching Nodes Comprised of Flip-Chip-Bonded MQW Modulators and Detectors on Silicon CMOS Circuitry", *IEEE Photonics Technology Letters*, vol. 8, no. 2, pp. 221-223, Feb. 1996 - [6] Ted K. Woodward, Ashok V. Krishnamoorthy, A. L. Lentine, L. M. F. Chirovsky, "Optical Receivers for Optoelectronic VLSI", IEEE Journal of Selected Topics in Quantum Electronics, vol. 2, no. 1, pp. 106-115, Apr. 1996 - [7] D. A. B. Miller, M. D. Feuer, T. Y. Chang, S. C. Shunk, J. E. Henry, D. J. Burrows, and D. S. Chemla, "Feild-effect transistor self-electrooptic effect device: Integrated photodiode, quantum well modulator and transistor", *IEEE Photonics Technology Letters*, vol. 1, no. 3, pp. 62-64, 1989 - [8] A. L. Lentine, L. M. F. Chirovsky, M. W. Focht, M. D. Feuer, G. D. Guth, R. Leibenguth, G. J. Przybylek, and L. E. Smith, "Diode-clamped symmetric self-electro-optic effect devices with subpicojoule switching energies", Appl. Phy. Lett., vol. 60, pp. 1809-1811, 1992 - [9] A. V. Krishnamoorthy, L. A. Lentine, K. W. Gossen, J. A. Walker, T. K. Woodward, J. E. Ford, G. F. Aplin, L. A. D'Asaro, S. P. Hui, B. Tseng, R. Leibenguth, D. Kossives, D. Dahringer, L. M. F. Chirovsky, D. A. B. Miller, "3-D integration of MQW modulators over active submicron CMOS circuits: 375 Mb/s transimpedance receiver-transmitter circuit", IEEE Photonics Technology Letters, vol. 7, no. 11, pp. 1288-1290, 1995 - [10] Montanaro, J. et. al., "A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor", *IEEE Journal of Solid State Circuits*, vol. 31, no. 11, pp. 1703-1714, Nov. 1996 - [11] M. E. Lee, W. J. Dally, P. Chiang, "Low-Power Area-Efficient High-Speed I/O Circuit Techniques", EEE Journal of Solid State Circuits, vol. 35, no. 11, pp. 1591-1599, Nov. 2000 - [12] S. Sidiropoulos, D. Liu, J. Kim, G. Wei, and M. Horowitz, "Adaptive Bandwidth DLLs and PLLs using Regulated Supply CMOS Buffers", 2000 Symposium on VLSI Circuits Digest of Technical Papers, pp. 124-127, Jun. 2000