Wooseok Choi

2papers

2 Papers

19.2ETApr 7
Analog Weight Update Rule in Ferroelectric Hafnia, using pico-Joule Programming Pulses

Alexandre Baigol, Nikhil Garg, Matteo Mazza et al.

In an effort to compete with the brain's efficiency at processing information, neuromorphic hardware combines artificial synapses and neurons using mixed-signal circuits and emerging memories. In ferroelectric resistive weights, the strength of the synaptic connection between two neurons is stored in the device conductance. During learning, programming pulses are applied to the synaptic weight, which reconfigures the ferroelectric domains and adjusts the conductance. One strategy to lower the energy cost during the training phase is to lower the duration of the programming pulses. However, the latter cannot be shorter than the self-loading time of the resistive weights, limited by intrinsic parasitics in the circuits. In this work, ferroelectric resistive weights are fabricated using a process compatible with CMOS Back-End-Of-Line integration, based on hafnia/zirconia nanolaminates. By laterally scaling the device area under 100 $μ$m$^2$, the self-loading time becomes sufficiently short to enable 20 ns programming, which corresponds to a maximum of 3 picoJoules per pulse. Further, in this work, the weight update rule with 20 ns pulses is experimentally measured not only for different amplitudes but also for different initial conductance states. We find that the final weight is determined by the pulse amplitude, independent of the initial weight value.

CRMay 31, 2020
Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference

Brandon Reagen, Wooseok Choi, Yeongil Ko et al.

As the application of deep learning continues to grow, so does the amount of data used to make predictions. While traditionally, big-data deep learning was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE to the client-cloud model allows cloud services to perform inference directly on the client's encrypted data. While HE can meet privacy constraints, it introduces enormous computational challenges and remains impractically slow in current systems. This paper introduces Cheetah, a set of algorithmic and hardware optimizations for HE DNN inference to achieve plaintext DNN inference speeds. Cheetah proposes HE-parameter tuning optimization and operator scheduling optimizations, which together deliver 79x speedup over the state-of-the-art. However, this still falls short of plaintext inference speeds by almost four orders of magnitude. To bridge the remaining performance gap, Cheetah further proposes an accelerator architecture that, when combined with the algorithmic optimizations, approaches plaintext DNN inference speeds. We evaluate several common neural network models (e.g., ResNet50, VGG16, and AlexNet) and show that plaintext-level HE inference for each is feasible with a custom accelerator consuming 30W and 545mm^2.