# Capacitive Proximity Communication With Distributed Alignment Sensing for Origami Biomedical Implants Matthew Loh and Azita Emami-Neyestanak, Member, IEEE Abstract—Origami implant design is a 3D integration technique which addresses size and cost constraints in biomedical implants. This paper presents a capacitive proximity interconnect scheme that enables chip-to-chip communication across folds in the Origami implant, allowing increased flexibility of chip placement and orientation. The capacitive plate array senses link quality and chip-to-chip alignment, and adapts the data rate at each plate accordingly, shutting down poorly-coupled links to save power. Instead of using separate plate arrays for alignment sensing and communication, this interconnect embeds the alignment sensor and transceiver arrays within the same set of plates, so that link quality can be measured at the communications plates directly, thus simplifying their adaptation to alignment. In order to save power and area, the sensor circuitry is distributed across the array and shares functional blocks with the transceiver. Data rates from 10-60 Mbps are achieved over 4-12 μm of parylene-C, with efficiencies up to 0.180 pJ/bit. Index Terms—Adaptive, alignment sensor, biomedical implants, Origami, parylene, proximity communication, 3D integration. # I. INTRODUCTION ESIGNERS of medical implants face three primary challenges: size, cost and power consumption. At the same time, there is a desire to increase the capability of these implants—both to expand the scale of current functionality, such as increasing the number of electrodes in neural recording or retinal prosthesis implants, as well as adding new functionality. Size and power considerations have driven the use of specialized, highly-integrated system-on-chip designs (e.g., [1]–[4]), the development of which can be cost-prohibitive for the low-volume applications typical in the biomedical market. Additionally, increasing the scale of existing functionality can pose a challenge even for highly-integrated designs; for example, if the design in [5] is scaled up to 1024 electrodes, approximately $8 \times 8 \text{ mm}^2$ is required for the stimulator array alone, excluding Manuscript received May 10, 2014; revised September 19, 2014, January 16, 2015, and January 24, 2015; accepted February 03, 2015. Date of publication March 12, 2015; date of current version April 30, 2015. This paper was approved by Associate Editor Michiel Pertijs. This work was supported by the National Science Foundation and STMicroelectronics. M. Loh was with the California Institute of Technology (Caltech), Pasadena, CA 91125 USA. He is now with Broadcom Corporation, Santa Clara, CA 95054 USA (e-mail: mloh@broadcom.com). A. Emami-Neyestanak is with the California Institute of Technology (Caltech), Pasadena, CA 91125 USA. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2015.2404335 power delivery, data telemetry and digital control—approaching (or at) the limits of implant size in small and delicate organs such as the eve. Given the low leakage power and high voltage drive ( $\sim 5-10$ V is typically used to drive the desired currents in neuro-stimulator applications) generally required of biomedical implants, it seems unlikely that CMOS scaling alone can provide the size reduction required to meet future demands for increased capability; even if it can, rising development complexity and masks costs will certainly exacerbate already steep development costs and long times-to-market. In order to address concerns of size and cost, large systems can be split into multiple chips and connected using 3D integration techniques. Previous research has demonstrated the viability of the polymer parylene-C as a biocompatible substrate to encapsulate ICs, break-out connectivity and integrate discrete components such as capacitors [6], and its flexibility and robustness permits the construction of foldable, squashable structures such as an inductive coil [7]. It therefore provides an appealing foundation for the design of implants that can be folded compactly for implantation and then deployed into operating configuration once inside the body, minimizing the invasiveness of the necessary surgery. Additionally, this Origami folding technique can be used to realize mechanically useful shapes, such as conforming a retinal prosthesis to the back of the eye (Fig. 1) [8], which can improve electrode contact and make stimulation more effective, reducing the required drive voltage and allowing smaller and more efficient electronics. This concept can be extended to address the high cost of developing custom SoC designs for each new implant; the desired electronics can be partitioned into commonly-used functional blocks, mass-produced as ICs that are embedded into parylene library modules with standardized electrical and mechanical interfaces. Custom implants can be assembled from these modules, reducing development cost and time-to-market. Proximity communication [9] provides a compelling way to achieve chip-to-chip communication in the context of Origami implants. Fig. 2 shows a conceptual example with external wireless and module-to-module power delivery to provide context; this paper, however, focuses on the communication aspects of the interface. Wires are of limited utility for communication in Origami implants, since they break easily under the stress of folding and are impossible to run across breaks in the parylene, such as between modules; any wires that are used (such as for power delivery coils) are confined to planar areas of the implant, Fig. 1. Origami implant used as retinal prosthesis, showing (a) position in the eye and (b) a Parylene-C prototype before (bottom) and after (top) folding [8]. Fig. 2. Example application of proximity communication to an Origami implant, with (a) 2D cross-section showing chips embedded in folded parylene, with proximity interfaces across folds in the same module or between module 1 and 2, and (b) inset detail of a module-to-module interface. Vertical distances have been exaggerated for clarity, and external wireless/module-to-module power delivery is shown only to provide context. such as in Fig. 2. Proximity communication takes advantage of the fact that the folds or module-to-module connections will place many of the ICs in such an implant face-to-face, and enables communication without the use of thin, high-density wires that break easily. The cost of integrating proximity communication is also fairly low: the plates or inductors forming the coupled link can be realized using the existing back-end metal stack, and power efficiencies similar to those of traditional wired links are possible [10], [11]. However, existing approaches to proximity communication have targeted multi-Gbps links in high-performance computing [12] and memory-stacking [11] applications, and so have been designed under a different set of constraints than the relatively low data rates and ultra-low power consumption required by implants. A further concern for proximity communication in Origami implants is mechanical alignment. Due to the strong desire to keep assembly costs low, many of the processing techniques that ensure good alignment in high-performance proximity interconnect are uneconomical. In addition, these implants face environments more hostile to alignment than the typical workstation or server—vibration, tissue growth and other movement mean that alignment between chips can change considerably over time. Although mechanical design of the implant can limit this effect, it is nevertheless the case that the ability to sense changes in alignment and adapt to them is an even more important consideration than in the high-performance space, where the primary concern is thermal expansion (e.g., [13], where misalignment is compensated for only up to $\pm 25 \ \mu m$ ). The twin demands for energy efficiency and misalignment tolerance can be contradictory; for instance, the need for misalignment tolerance suggests the use of inductively-coupled proximity links, since they have longer range. However, they also tend to have high power consumption, requiring either a constant current drive [11] or imposing tight timing requirements (and therefore complex and power-hungry timing recovery) at the receiver to capture current pulse inputs [14]. To compound matters, inductively-coupled links do not lend themselves well to use as alignment sensors. The size of inductors required (e.g., 60 $\mu$ m and 79 $\mu$ m diameters at 110 $\mu$ m pitch in [10]) is typically much larger than that for capacitive plates (e.g., $30 \times 30 \mu m$ at 36 $\mu m$ pitch in [12]), which reduces achievable alignment-sensor resolution. The intrinsic high-pass transimpedance of an inductive link is also a drawback—the short voltage pulses produced at the receiver are inherently more difficult to quantize than the square waves produced by the relatively benign transfer function of a properly-designed capacitive link, thus complicating measurements of mutual inductance. This design uses capacitively-coupled proximity interconnect, which poses much less of a power and timing problem and is much more amenable to alignment-sensing operations [15]. However, existing approaches use specialized alignment sensors running on a separate set of plates [16], [17], and therefore require more area of the two chips to be aligned (both the alignment sensor and the communication array need to be in alignment), limiting flexibility. They also require link quality to be inferred from alignment data, instead of directly measuring it, a relatively costly operation within the context of low-power devices having limited memory and processor resources. Other work [18] has been done with the alignment sensor sharing the same array as the communication circuitry, but the application targeted (a single < 1 Mbps link, with power delivery handled by the same array) and the type of alignment sensor used (a ring oscillator and counter under each plate in the array) mean that its data rate density and power efficiency are not appropriate for use in Origami implants. This paper proposes a system where the alignment sensors share plates with the communication array, and so avoids the need for link Fig. 3. Top-down view of a target plate overlaid on top of the sensor array, in (a) best-case (maximum overlap) and (b) worst-case (minimum overlap) alignment. Active sensor plates are shaded, and the associated target plate is outlined. quality inference calculation and limits the area of alignment between the two chips to the communication array only. Additionally, the alignment-sensing circuitry is not dedicated to each plate, but instead distributed across the communication array, and, where possible, shares functional blocks already present to perform communications tasks, increasing data rate density and improving power efficiency. #### II. PLATE AND ARRAY DESIGN The proximity interconnect is formed by capacitive coupling between plates in the pad-level metal layer of the two chips. In order to maximize flexibility, both sides of the link implement full receiver and transmitter functionality. However, only one side of the link needs to be able to sense alignment, so two different array types, sensor and target, are used. This distinction between array types confers important advantages: it allows the design of the sensor array to be separately optimized from the target array to increase alignment-sensing resolution, and saves power and area by eliminating unneeded sensing circuitry from the target array. No significant design time is added, since the target array uses a subset of the sensor array blocks. The sensor array is split up into smaller constituent plates, with $n^2$ plates corresponding in size to one target plate (Fig. 3). These smaller plates are joined together in groups, which form the basic units of alignment sensing and communication, and are analogous to pixels in an image sensor. This compoundplate configuration enhances alignment-sensing resolution by reducing the effective sensing-unit ('pixel') step-size to the dimensions of the smaller plates. The smaller step-size is also beneficial during communication, since it increases the overlap area between sensor and target in the event the two are misaligned. There are limits, however, to how small the constituent plates can be made. As these plates are made smaller, more of them need to be connected together to equal the size of the target plate, increasing switch and routing capacitance and reducing link gain. Additionally, the gap in-between each sensor plate is governed by the minimum metal spacing allowed in the design rules; smaller constituent plates result in more area lost to these gaps, reducing parallel-plate coupling capacitance, an effect only partially mitigated by the corresponding increase in fringing capacitance. Assuming the resistance of the switches used to connect the plates is small, and taking the sensor as the receiver and Fig. 4. Capacitive link, showing transmit driver resistance and receiver bias resistance. the target as the transmitter, the capacitively-coupled link can be abstracted as a capacitive voltage divider with transmitter $(R_{tx})$ and receiver biasing $(R_{\text{bias}})$ resistances (Fig. 4). These resistances create a band-pass filter, whose passband exists between $$\omega_l \approx \frac{1}{R_{\text{bias}}(C_{G2} + C_{rx})}, \, \omega_h \approx \frac{1}{R_{tx}(C_{G1} + C_{tx})}$$ (1) where $C_{G2}$ is the parasitic capacitance of the sensor plate to ground, $C_{rx}$ is the input capacitance of the receiver, $C_{G1}$ is the parasitic capacitance of the target plate to ground and $C_{tx}$ is the output capacitance of the transmitter. This suggests that $R_{tx}$ should be kept small, a relatively simple matter for the low data rates required in biomedical applications; for example, if a simple inverter is used as the transmitter, $R_{tx}$ (in a typical 65 nm CMOS process) can easily be made $<500~\Omega$ . Plate sizes range from $20\times20~\mu\mathrm{m}$ [19] to $35\times35~\mu\mathrm{m}$ [9], yielding parasitic capacitances $<50~\mathrm{fF}$ , so $f_h>6.4~\mathrm{GHz}$ . More challenging is the need for a large $R_{\mathrm{bias}}$ , which generally indicates the use of a leakage path to define the input bias of the receiver [12], the approach taken here. If these resistances are sized appropriately, the remaining transfer function is a simple capacitive voltage divider: $$\frac{V_{rx}}{V_{tx}} = \frac{C_C}{C_C + C_{G2} + C_{sw} + C_w + C_{rx}}$$ (2) where $C_C$ is the coupling capacitance between the two plates (Fig. 4), $C_{\rm sw}$ is the parasitic switch capacitance and $C_w$ is the wiring parasitic capacitance. Switch capacitance can be estimated by: $$C_{\rm sw} \approx C_{\rm on} \cdot {\rm sw}_{\rm on} + C_{\rm off} \cdot {\rm sw}_{\rm off}$$ (3) where $C_{\rm on}$ is the parasitic capacitance of an 'on' switch and $C_{\rm off}$ is that of an 'off' switch. ${\rm sw}_{\rm on}$ and ${\rm sw}_{\rm off}$ are the number switches in the 'on' and 'off' states, respectively, connected to Fig. 5. Plate and switch configurations for (a) n=2, (b) n=3 and (c) n=4. The highlighted group of active sensor plates indicates the worst-case loading condition, where the largest number of inactive switches are connected to the active plates. the active group of plates (Fig. 5). Wiring parasitic capacitance can be estimated as: $$C_w \approx (\text{sw}_{\text{on}} + \text{sw}_{\text{off}}) \cdot \frac{C_{w,\text{unit}}}{n}$$ (4) where $C_{w,\mathrm{unit}}$ is the capacitance of a wire running the whole length of the group of plates. For simplicity, all plate-to-plate wires are assumed to present the same parasitic load. The optimal value of n can be selected by running different configurations through a 3D EM field solver to extract $C_C$ and $C_{G2}$ . Link gain $(V_{rx}/V_{tx})$ across different n, for the dielectric and metal configuration in Fig. 8, and 12 $\mu$ m parylene dielectric interposed, $C_{\rm on} = 2C_{\rm off} = 1$ fF, $C_{w,\rm unit} = 10$ fF and $C_{rx} = 5$ fF, is shown in Figs. 6 and 7. Under best-case alignment, increasing n does not cause an increase in overlap between the target plate and the sensor plate group, so the extra parasitics and reduction in coupling capacitance result in a direct loss of gain. However, under worst-case alignment conditions, better gain is realized when n=2, due to the increase in overlap between target and sensor. No further benefit in either alignment condition is realized by increasing n to 3; in fact, the extra parasitic loading results in a decrease in link gain under best-case alignment. In order to provide reasonable margins for noise and input slicer offset, a link gain >0.03 (corresponding to a received signal amplitude >30 mV<sub>PP</sub> for an input amplitude of 1 $V_{PP}$ ) across a 12 $\mu$ m parylene dielectric (the thickest tested) was targeted. Therefore, n=2 and a target plate dimension of $60 \times 60 \mu m$ were chosen for this design. While selecting n=2 does improve link gain in the case where the target and sensor plates are poorly aligned, a decrease in gain still results, which affects the noise margins, power efficiency and maximum achievable data rate across the link. If, due Fig. 6. Link gain $(V_{RX}/V_{TX})$ for different values of n, when plates are in best-case alignment. Fig. 7. Link gain $(V_{RX}/V_{TX})$ for different values of n, when plates are in worst-case alignment. Fig. 8. Dielectric and metal layers used to form plate structure, and sensor/target array arrangement (target chip outline not shown for clarity). to in-plane (x- and/or y-axis) misalignment, all the target plates are simultaneously poorly aligned to the sensor array, the array adaptation scheme will have no choice but to select poorly-coupled, less efficient links. This situation can be avoided if the target plates are spaced at a non-integer multiple of the sensor plate pitch. Even if the target plates can therefore never all be simultaneously perfectly aligned with the sensor, neither can they all be in worst-case alignment, providing some flexibility for the adaptation scheme to pick the best-coupled set of plates. For this design, a multiple of 3 1/3 is chosen (Fig. 8). Fig. 9. Architecture of the sensor and target cells, with key functional blocks indicated. Crosstalk between adjacent target plates is also a concern; although this is considerably mitigated by the choice of capacitive, rather than inductive, proximity communication, it can become significant if the plates are dense enough. Differential signaling and careful arrangement of the transmitting plates can help reduce the effects of crosstalk [12]. However, differential signaling has the disadvantage of requiring two, instead of one, set of plates to be well-coupled, complicating the alignment problem. The additional energy required for differential signaling is a further concern, given the tight power constraints imposed in biomedical implants. Instead, the spacing between target plates is made relatively large, which a field solver simulation suggests will reduce crosstalk to a negligible level. # III. TRANSCEIVER ARRAY WITH DISTRIBUTED ALIGNMENT SENSING The structure of the target array is relatively straightforward—each target plate is uniquely associated with a single target cell, which contains the transmitter and receiver for that plate. On the other hand, the sensor array is more complicated. Each sensor cell uses the same transmitter and receiver design as the target cell, but adds alignment sensing circuitry and switches to connect to one of the four possible groups of four sensor plates it is associated with (Fig. 9); these plates are shared with neighboring sensor cells, and control logic ensures that no plate is connected to more than one cell at a time. No explicit provision is made for transmitting a clock through the same set of plates. Instead, it is proposed that a common clock is provided to both chips via the power delivery circuitry [4]; the low data rates (tens of Mb/s) mean that very little to no explicit synchronization circuitry should be required. ## A. Alignment Sensing The dielectric between the two plates is composed of the passivation as well as one or two parylene sheets, each 4–6 $\mu$ m thick, depending on the exact structure of the parylene module. Since the distance between the two plates can be relatively large compared to the distance between each plate and its corresponding ground plane (Fig. 8), it is difficult to distinguish the capacitance between the sensor and target plate from that between the sensor and the ground plane under the target plate. This ground plane could be moved to a lower level metal under the target plate, but the restriction this would place on routing density makes this option unacceptable. This prevents the use of techniques that attempt to sense an undriven capacitance in order to determine alignment [18], which are applied when power delivery to the target chip is not possible. Since power can be delivered to the target chip in this case, it makes sense to use the existing transmitter circuitry in the target cell to drive a stimulus (e.g., clock) onto the target plate to electrically distinguish it from the ground plane. The amplitude of the signal received at the sensor is proportional to the amount of coupling between the plates, and can be used to determine link quality and alignment. In order for this coupled amplitude to provide useful information for an array adaptation scheme, it will have to be digitized. Placing a full ADC under each group of sensor plates is unappealing from a power and area standpoint, especially if all the sensor cell circuitry (switches, transmitter, receiver, alignment sensor, digital control) is to fit under a single set of sensor plates (a roughly $60 \times 60~\mu m$ area). Instead, a distributed approach needs to be taken, where elements or stages of the ADC are split up and spread throughout the sensor array. Although this restricts the number of groups of sensor plates that can be sensed simultaneously, determining alignment is not a time-sensitive operation and the tradeoff to save power and area is an acceptable one. Thanks to its inherently segmented nature, good resolution characteristics and low power consumption, a time-to-digital converter (TDC) is an appealing candidate for use as the digitizing element in the alignment sensor. To enhance resolution and reduce the effects of process variation, a vernier TDC is used. Each stage of the TDC (set of delay elements and arbiter) is distributed through the sensor array, and connected together during alignment sensing to form a complete TDC; in this design, a 7-stage TDC is used for 3 bit output (Fig. 10). To convert the coupled signal amplitude into a DC level, the receiver's input slicer is used to drive a rectifier (Fig. 11). When the output of the slicer transitions, the rectifier generates pulses to control switches, shunting the high or low levels of the received signal ('in') onto the appropriate storage capacitor. To ensure that the capacitors capture the correct values, the delay Fig. 10. Sensor array structure, showing TDC path for alignment sensing at indicated plates. Fig. 11. Rectifier and associated timing diagram. Fig. 12. Differential voltage-controlled delay line. from a transition in the received signal to the pulse, and the length of the pulse itself, must be short enough so that the pulse ends before a further transition in the received signal: $$t_{\text{slicer}} + t_{pg} + t_{pw} < t_{\text{bit}} \tag{5}$$ To increase timing margin, the target cell transmitter is set to output a low-frequency (quarter-rate) alternating sequence during alignment sensing. The rectified voltages are used to bias a differential voltage-controlled delay line (VCDL; Fig. 12). The bias voltage adjusts the pull-down strength of an inverter in each unit cell; higher voltages result in stronger pull-down and, because the TDC stimulus is a rising edge, less delay. The TDC converts the delay generated by the VCDL into a digital code, and link quality is assessed by comparing this code against a look-up table of supported data rates; similarly, alignment is determined by comparing it against results from the field solver simulation. As a result, the linearity of the ADC formed by the VCDL and TDC (whether INL or DNL) is of secondary concern compared to its total error. Simulations of the implemented design suggest that offset is the most significant effect contributing to total error, so this is corrected via variable-threshold buffers at the output of the VCDL. The threshold of these buffers is varied by digitally adjusting their pull-up strength. Higher pull-up strength results in a higher threshold voltage, delaying the corresponding VCDL output edge to compensate for offset in either the VCDL itself or the following TDC. The use of a VCDL & TDC combination to realize the ADC helps to mitigate the effects of supply variation, an important consideration in biomedical devices, where this effect may be considerable depending on the quality of the power delivery and regulation. If both TDC & VCDL are realized using similar delay elements, as is the case here, any change in their shared supply will tend to affect the delay of both in a similar way. Residual error can be calibrated out before each run of an alignment sense operation, since alignment sensing is completed quickly enough that the supply should not vary significantly during it. Vertical (z-axis) separation between the sensor and target chips can be measured directly from the corresponding sensor cell's TDC output. The amount of coupling capacitance is inversely proportional to the distance between the sensor and target plates, so the conversion from physical separation to coupling amplitude (therefore, TDC output word) is non-linear. As a result, alignment sensor resolution is better when the two chips are close to each other, and degrades as they get further apart. Determining in-plane (x- and y-axis) alignment is somewhat more involved. The output of a single group of plates is insufficient to determine the in-plane alignment of the associated target cell. Instead, readings from two adjacent groups of sensor plates are used (Fig. 13). Ignoring fringing fields, the voltage seen at each group of plates is: $$\frac{V_1}{V_{tx}} = \frac{\eta \cdot C_C \cdot m}{\eta \cdot C_C \cdot m + C_{G2}}$$ $$\frac{V_2}{V_{tx}} = \frac{\eta \cdot C_C (1 - m)}{\eta \cdot C_C (1 - m) + C_{G2}}$$ (6) where $C_C$ is the amount of coupling capacitance between the target plate and a group of sensor plates in perfect alignment (no air gap, target plate directly on top of sensor plates) and $\eta$ is Fig. 13. Two adjacent groups of sensor plates used for x-axis alignment sensing. $0 \le m \le 1$ ; when m = 0, the target plate is all the way to the left (completely over $V_1$ ). Likewise, when m = 1, it is completely over $V_2$ . a de-rating factor to account for vertical separation between the plates. Taking the ratio of the two expressions yields: $$\frac{V_1}{V_2} = \frac{\eta \cdot C_C \cdot m}{\eta \cdot C_C (1 - m)} \cdot \frac{\eta \cdot C_C (1 - m) + C_{G2}}{\eta \cdot C_C \cdot m + C_{G2}}.$$ (7) Assuming $C_{G2} \gg \eta \cdot C_C$ (reasonable, since link gains are typically $\ll 1$ ), this simplifies to: $$\frac{V_1}{V_2} = \frac{m}{1 - m}. (8)$$ Rearranging to find m: $$m = \frac{V_1}{V_1 + V_2}. (9)$$ This can be converted into an offset from the midpoint between the two groups of plates: $$d = k \cdot d_{\text{target}} \cdot m \tag{10}$$ where $d_{\rm target}$ is the side-length of the target plate and k is a scaling factor applied to correct for the effect of fringing fields, which are ignored in the derivation above. Although fringing fields have, in general, a non-linear effect on d, a linear correction is sufficient over the small distances being measured by this technique. In addition to planar alignment errors (in the x, y, and z-axes), it is also possible for the arrays to be tilted (in the $\theta_x$ or $\theta_y$ axes) against each other. Although the mechanical design of the implant should limit these errors, some residual misalignment is inevitable—for example, if the thickness of the parylene between the chips varies from edge-to-edge of the array. Tilt errors such as this manifest themselves as an increase in the gap between plates over a portion of the array. The alignment sensor will detect this as a simple increase in the z-axis separation between the relevant plates, allowing their data rate to be adapted accordingly. ## B. Transmitter and Receiver The input buffer, transmitter and receiver designs are used in both the sensor and target cells. The transmitter is a tri-state buffer modified with a leakage path (Fig. 14). Low-leakage standard threshold voltage, low-power (SVTLP) devices are used in the output path to reduce leakage power, while high-leakage low threshold voltage, general-purpose (LVTGP) devices are used to define the source-follower input bias, while achieving the high $R_{\rm bias}$ required at the plates. A leakage cutoff device is used to shut the leakage path down when the transmitter is active, to prevent added static current draw. Fig. 14. Tri-state buffer-based transmitter, with leakage path to define plate bias voltage. Fig. 15. Source-follower buffer with gateable Wilson current mirror bias. Fig. 16. 3-stage hybrid low-pass filter. Fig. 17. Input slicer and offset compensation (SR latch not shown). 'Reset' zeroes the offset compensation capacitor and 'oc\_en' is asserted during offset compensation calibration. The source-follower input buffers isolate the plates from the rest of the cell circuitry in order to minimize parasitic loading Fig. 18. Input slicer offset estimated across 200 Monte Carlo simulation runs, (a) before and (b) after offset compensation. Fig. 19. Die micrograph, with sensor and target arrays marked. of the capacitive link (Fig. 15), and drive two distinct signal paths—the first contains no filtering, and is used by both the input slicer and the rectifier. The second contains a low-pass filter (LPF), which generates a reference voltage for the input slicer. The bias points of these two paths need to match well in order to ensure that the LPF generates an accurate reference voltage. Variation in the bias point is minimized through the use of long-channel devices and a Wilson current mirror to boost output resistance. The current mirror is designed to shut off when the buffer is idle (when the cell is completely off or is acting as a transmitter) to save power. In order to generate a stable reference voltage for the input slicer, the LPF has to have a cutoff frequency $(f_{\rm lpf})$ below the fundamental of the longest expected data sequence. For example, in this design the link was designed to handle a PRBS-7 at data rates as low as 20 Mbps, suggesting $f_{\rm lpf} < 156$ kHz. Such low cutoff frequencies are difficult to achieve using purely passive elements in a reasonable amount of area. At the same time, the filter needs to accept an input signal at the data rate, so using a purely switched-capacitor approach would require a high switching frequency, wasting power. Instead, a hybrid multi-stage LPF is used (Fig. 16). The first stage is a simple first-order RC filter with a relatively high cutoff Fig. 20. Sensor and target cell layout detail. Fig. 21. Test setup. Inset: detail of chips when brought into alignment. frequency; the following switched-capacitor stages step down in cutoff frequency until $f_{\rm lpf}=130~{\rm kHz}$ . To save power and area, buffers between the various stages of the filter are omitted; the resulting inaccuracies are mitigated by sizing the input capacitors of the two switched-capacitor stages much smaller than the output capacitor of the preceding stage. The clock for the switched-capacitor filter is generated from the same clock used to drive the slicer. However, since the purpose of the LPF is the generation of a DC level that should not change quickly over time, its timing relative to the slicer clock is unimportant. The input slicer is a strongARM latch used as a comparator (Fig. 17), followed by an SR latch. Data rates <100 Mbps Fig. 22. The effect of VCDL/TDC offset compensation on (a) offset error and (b) total error, measured over 144 sensor cells across 6 chips. are targeted, with signal amplitudes as low as 30 mV<sub>PP</sub>. The low data rates allow the slicer plenty of time to evaluate small incoming signals, so the latch design emphasizes low power at the expense of gain and bandwidth. A pressing concern is input offset; a 200-run Monte Carlo simulation of the slicer suggests that the offset has $\sigma \approx 15.4$ mV (Fig. 18(a)), so correcting up to $3\sigma$ offset would require the offset compensation to have a range of approximately $\pm 50$ mV. In order to provide this range with <5 mV resolution within a limited power and area budget, charge pump-based offset compensation is added in series with the threshold-generating LPF. Leakage is reduced to about 1 mV/ms in the FF corner through the use of a thick-oxide storage capacitor, triple-well devices to eliminate diode leakage through the switches, and the provision of a low-resistance path to shunt leakage from the charge pump switches away from the storage capacitors. The stored voltage is refreshed when the link is taken down to re-acquire chip-to-chip alignment. The maximum observed residual offset of the post-compensated slicer is about 3.5 mV over 200 Monte Carlo runs (Fig. 18(b)). ## IV. HARDWARE MEASUREMENTS A $6 \times 4$ cell ( $13 \times 9$ plate) sensor and $4 \times 3$ cell target array were implemented in the same 65 nm bulk CMOS test chip (Fig. 19). Since target plate size is $60 \times 60~\mu m$ , n=2 and the minimum pad-level metal spacing is $2~\mu m$ , sensor plates measure $29 \times 29~\mu m$ . The sensor cell circuitry and associated test logic are designed to fit under a $2 \times 2$ set of sensor plates and connect by abutment to form the complete sensor array (Fig. 20). The target cell electronics are likewise designed to fit under a single target plate; because the spacing between target plates is much larger than that between sensor plates, the target cells do not connect by abutment. The test setup (Fig. 21) consists of two test chips mounted on small daughterboard PCBs using chip-on-board assembly. The daughterboards are connected to target and sensor mainboards. The target mainboard is mounted on a 5-axis micropositioner, used to planarize the two chips in the $\theta_x$ and $\theta_y$ axes. Displacement of the two chips in the x, y, and z-axes was controlled by this micropositioner in conjunction with a 3-axis micropositioner connected to the sensor mainboard. The $\theta_z$ axis was left uncorrected due to limitations in the equipment available; this was mitigated by careful alignment of the chips against guides on the daughterboard during assembly. Initial alignment of the two chips was conducted visually using a microscope, with the alignment-sensing function of the sensor array used to correct any remaining alignment error. Slight over-torque was applied in order to cause the sensor and target boards to flex against each other and minimize the size of air gaps between the chips and/or parylene sheets. Tests were conducted with different thickness of parylene (4, 5, 6, 2 × 4, 2 × 5 and 2 × 6 $\mu$ m, with $\pm 0.5 \mu$ m tolerance per sheet) by placing a single sheet of parylene over the plate arrays on the sensor chip and fixing it to the daughterboard. A second sheet was added to the target chip as necessary. The VCDL/TDC-based ADC was tested independently by setting the VCDL bias voltages through override pads, bypassing the input slicer and rectifier. To measure the effects of offset compensation, the transfer characteristic was measured before and after adjustment of the variable-threshold VCDL output buffers. Results were collected across all 24 sensor cells in six different chips (Fig. 22), with an improvement in RMS offset error from 1.04 LSB to 0.38 LSB, and a corresponding improvement in RMS total error from 1.54 LSB to 0.97 LSB. Vertical (z-axis) alignment sensitivity was tested at the thicknesses of parylene listed above, as well as with air only (Fig. 23). Readings for micropositioner offsets less than about 5 $\mu$ m experience some non-linearity due to over-torque applied to the chips, and coupling capacitances with air only and thinner (4 and 5 $\mu$ m) parylene experience strong enough coupling that the alignment sensor output saturates. Despite these non-idealities, the sensor performed largely as expected. When coupling Fig. 23. Alignment sensor output (raw ADC codes) under vertical (z-axis) separation. Fig. 24. Alignment sensor output (calculated using (10)) under in-plane (x or y-axis) misalignment. is very strong (e.g., with an air-only dielectric), the inverse dependence of coupling strength on vertical separation results in a non-linear sensor output characteristic, essentially hyperbolic except where the sensor saturates. The introduction of parylene limits coupling strength and linearizes the sensor but reduces sensitivity; sensor resolution (the smallest vertical displacement that can be detected) is 4 $\mu m$ in the worst case, with 2 $\times$ 6 $\mu m$ parylene. In theory, in-plane (x- and y-axis) alignment measurements should be insensitive to the presence of thicker parylene dielectrics, since they depend on the ratio between adjacent sensor outputs, both of which are equally affected by the reduction in coupling. In practice, the limited resolution of the ADC used means this reduction in coupling results in a loss in alignment sensor resolution; using an air-only dielectric, resolution is 5 $\mu$ m, while with 2 × 6 $\mu$ m parylene it is 20 $\mu$ m (Fig. 24). Results for all thickness tested are summarized in Fig. 25. When properly aligned, communication over all 12 channels available was demonstrated at data rates up to 60 Mbps/channel with BER $<10^{-9}$ . Input slicer offset compensation was run Fig. 25. Achieved in-plane alignment sensor resolution vs parylene thickness. TABLE I PERFORMANCE SUMMARY | D. | 65 1 11 63 60 6 | |-------------------------------------|---------------------------------------| | Process | 65 nm bulk CMOS | | Die Area | 1.6 mm x 2.4 mm | | Sensor Array Area | 401 μm x 277 μm | | Target Array Area | 370 μm x 267 μm | | Data Rate | 12 x 60 Mbps | | BER | < 10 <sup>-9</sup> w/ PRBS-7 | | Transmitter & Input Buffer Supply | 1.0 V | | Slicer, Rectifier, VCDL & TDC | 0.7 V | | supply | | | Power Dissipation (Sensor + Target) | | | Transceiver @ 12 x 60 Mbps | $100.9 \mu W + 27.7 \mu W$ | | Alignment Sensor | $23.1 \mu\text{W} + 20.2 \mu\text{W}$ | | Figures-of-Merit | | | Power | 0.180 pJ/bit | | Area (based on sensor array size) | 6446 Mbps/mm <sup>2</sup> | for ${\sim}10~\mu s$ (600 clock cycles 60 Mbps) every 2 ms, for a net data rate (over 12 channels) of ${\sim}716$ Mbps. Total power consumption at this data rate was 129 ${\mu}W$ ; a results summary is presented in Table I. Maximum achievable data rate is dependent on the amount of coupling between target and sensor plates (Fig. 26). Fig. 26. Maximum data rates achievable (BER $< 10^{-9}$ ) under best-case alignment, for various thicknesses of parylene. #### V. CONCLUSION By integrating implant electronics onto a foldable parylene structure, Origami design gives engineers a tool for overcoming the size, power and cost constraints typically faced when building a biomedical implant. With proper design, Origami implants can even be used to realize mechanically useful shapes, thereby enhancing implant performance or providing new capabilities. Moving forward, the Origami design style can be extended to encompass a modular approach to implant design that envisions the development of a library of standard functional blocks built using ICs in parylene, which can be assembled on-demand for custom implants. A vital component of this vision is a wireless link that allows the various ICs in an Origami implant to communicate with each other reliably and efficiently. A capacitive proximity interconnect for this purpose has been developed and fabricated in 65 nm CMOS. It contains an embedded alignment sensor, which allows link quality to be assessed and the array adapted so that only the best-coupled (and therefore most power efficient) links are used whenever possible. This ensures robustness and reliability in the face of misalignment due to fabrication tolerances, patient movement or other perturbations. By obviating the need for a separate alignment sensor, the embedded sensor saves area and simplifies the array adaptation logic. The alignment sensor uses a rectifier that is controlled by the receiver's input slicer; this re-use of existing hardware makes the design more compact and reduces leakage power. To further save area and power, the sensor's ADC is formed from a TDC that is distributed across the transceiver array. The transceiver itself has been optimized for power-efficient communication at the upper end of typical biomedical data rates, in the 10-60 Mbps range, and this has been demonstrated through dielectrics as thick as $12~\mu m$ of parylene-C, with a power consumption of 0.180~pJ/bit. ## ACKNOWLEDGMENT The authors thank M. Monge for helpful technical discussions, K. Potter for advice on the test setup, J. Park and Y. Liu for parylene fabrication, STMicroelectronics for chip fabrication and the NSF for funding support. #### REFERENCES - X. Zou, W.-S. Liew, L. Yao, and Y. Lian, "A 1 V 22 μW 32-channel implantable EEG recording IC," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, 2010, pp. 126–127. - [2] C. Lopez et al., "An implantable 455-active-electrode 52-channel CMOS neural probe," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)* Dig. Tech. Papers, 2013, pp. 288–289. - [3] D. Han, Y. Zheng, R. Rajkumar, G. Dawe, and M. Je, "A 0.45 V 100-channel neural-recording IC with sub-μW/channel consumption in 0.18 μm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2013, pp. 290–291. - [4] M. Monge et al., "A fully intraocular 0.0169 mm²/pixel 512-channel self-calibrating epiretinal prosthesis in 65 nm CMOS," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2013, pp. 296–297. - [5] E. Noorsal, K. Sooksood, H. Xu, R. Hornig, J. Becker, and M. Ort-manns, "A neural stimulator frontend with high-voltage compliance and programmable pulse shape for epiretinal implants," *IEEE J. Solid-State Circuits*, vol. 47, no. 1, pp. 244–256, Jan. 2012. - [6] J. Chang, R. Huang, and Y.-C. Tai, "High-density IC chip integration with parylene pocket," in *Proc. IEEE Int. Conf. Nano/Micro En*gineered and Molecular Systems (NEMS), 2011, pp. 1067–1070. - [7] Y. Zhao, M. Nandra, and Y. Tai, "A MEMS intraocular Origami coil," in 16th Int. Solid-State Sensors, Actuators and Microsystems Conf. (TRANSDUCERS), 2011, pp. 2172–2175. - [8] Y. Liu et al., "Parylene Origami structure for intraocular implantation," in Proc. Transducers Eurosensors XXVII: 17th Int. Conf. Solid-State Sensors, Actuators and Microsystems (TRANSDUCERS EUROSEN-SORS XXVII), 2013, pp. 1549–1552. - [9] R. Drost, R. Hopkins, R. Ho, and I. Sutherland, "Proximity communication," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1529–1535, Sep. 2004. - [10] N. Miura, M. Saito, and T. Kuroda, "A 1 TB/s 1 pJ/b 6.4 mm²/TB/s QDR inductive-coupling interface between 65-nm CMOS logic and emulated 100-nm DRAM," *IEEE J. Emerging and Selected Topics in Circuits and Systems*, vol. 2, no. 2, pp. 249–256, Jun. 2012. - [11] N. Miura, Y. Take, M. Saito, Y. Yoshida, and T. Kuroda, "A 2.7 Gb/s/mm<sup>2</sup> 0.9 pJ/b/chip 1 coil/channel ThruChip interface with coupled-resonator-based CDR for NAND flash memory stacking," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2011, pp. 490–492. - [12] D. Hopkins et al., "Circuit techniques to enable 430 Gb/s/mm<sup>2</sup> proximity communication," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2007, pp. 368–609. - [13] R. Drost, R. Ho, D. Hopkins, and I. Sutherland, "Electronic alignment for proximity communication," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, 2004, vol. 1, pp. 144–518. - [14] N. Miura et al., "A 1 Tb/s 3 W inductive coupling transceiver for 3D-stacked inter-chip clock and data link," *IEEE J. Solid-State Cir*cuits, vol. 42, no. 1, pp. 111–122, Jan. 2007. - [15] M. Loh and A. Emami-Neyestanak, "Capacitive proximity communication with distributed alignment sensing for Origami biomedical implants," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, Sep. 2013, pp. 1–4. - [16] A. Chow, D. Hopkins, R. Ho, and R. Drost, "Measuring 6D chip alignment in multi-chip packages," in *Proc. IEEE Sensors*, 2007, pp. 1307–1310. - [17] R. Canegallo, M. Mirandola, A. Fazzi, L. Magagni, R. Guerrieri, and K. Kaschlun, "Electrical measurement of alignment for 3D stacked chips," in *Proc. 31st Eur. Solid-State Circuits Conf., ESSCIRC*, 2005, pp. 347–350. - [18] Y.-S. Lin, D. Sylvester, and D. Blaauw, "Alignment-independent chipto-chip communication for sensor applications using passive capacitive signaling," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1156–1166, Apr. 2009. - [19] K. Kanda, D. Antono, K. Ishida, H. Kawaguchi, T. Kuroda, and T. Sakurai, "1.27 Gb/s/pin 3 mW/pin wireless superconnect (WSC) interface scheme," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2003, vol. 1, pp. 186–487. Matthew Loh received the B.S. degree in electrical and computer engineering from Lafayette College, Easton, PA, USA, in 2004, and the M.S. and Ph.D. degrees in electrical engineering from California Institute of Technology (Caltech), Pasadena, CA, USA, in 2009 and 2013, respectively. In 2010, he interned at IBM T.J. Watson Research Center, Yorktown Heights, NY, USA, where he worked on continuous-time decision-feedback equalizers. He joined Broadcom, Santa Clara, CA, USA, in 2013, where he works on analog front-ends, digital CDRs, and adaptive equalizers for high-speed electrical and optical interconnect. Azita Emami-Neyestanak (S'97–M'04) received the B.S. degree with honors from Sharif University of Technology, Tehran, Iran, in 1996, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1999 and 2004, respectively. She is currently a Professor of electrical engineering at the California Institute of Technology (Caltech), Pasadena, CA, USA. From July 2006 to August 2007, she was with Columbia University, New York, NY, USA, as an Assistant Professor in the Department of Electrical Engineering. She also worked as a Research Staff Member at IBM T.J. Watson Research Center, Yorktown Heights, NY, USA, from 2004 to 2006. Her current research areas are high-performance mixed-signal integrated circuits and VLSI systems, with the focus on high-speed and low-power optical and electrical interconnects, clocking circuits, biomedical implants and sensors, drug delivery systems, and compressed sensing.