Towards Exact Gradient-based Training on Analog In-memory Computing
This addresses the challenge of energy-efficient AI training on analog accelerators, which is crucial for reducing costs, but it is incremental as it builds on existing heuristic methods.
The paper tackles the problem of inexact convergence in stochastic gradient descent (SGD) when training models on analog in-memory computing devices, showing that a heuristic algorithm called Tiki-Taka can exactly converge to a critical point and eliminate asymptotic error, as verified by simulations.
Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - stochastic gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergent issue of SGD, which is caused by the asymmetric updates on the analog devices. We then provide a lower bound of the asymptotic error to show that there is a fundamental performance limit of SGD-based analog training rather than an artifact of our analysis. To address this issue, we study a heuristic analog algorithm called Tiki-Taka that has recently exhibited superior empirical performance compared to SGD and rigorously show its ability to exactly converge to a critical point and hence eliminates the asymptotic error. The simulations verify the correctness of the analyses.