LGJul 14, 2023

Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning

Davide Giacomini, Maeesha Binte Hashem, Jeremiah Suarez, Swarup Bhunia, Amit Ranjan Trivedi

arXiv:2307.07631v12.02 citationsh-index: 53

Originality Incremental advance

AI Analysis

This work addresses deployment challenges for deep learning on low-power devices, offering a novel approach that is incremental in its application of existing concepts like glimpse-based processing and in-memory computing.

This paper tackles the high computational cost of deep neural networks on resource-constrained devices by proposing a memorization-based inference method that is compute-free and relies on lookups, achieving energy efficiency improvements of up to 83 times compared to existing compute-in-memory approaches for MNIST character recognition.

The rapid advancement of deep neural networks has significantly improved various tasks, such as image and speech recognition. However, as the complexity of these models increases, so does the computational cost and the number of parameters, making it difficult to deploy them on resource-constrained devices. This paper proposes a novel memorization-based inference (MBI) that is compute free and only requires lookups. Specifically, our work capitalizes on the inference mechanism of the recurrent attention model (RAM), where only a small window of input domain (glimpse) is processed in a one time step, and the outputs from multiple glimpses are combined through a hidden vector to determine the overall classification output of the problem. By leveraging the low-dimensionality of glimpse, our inference procedure stores key value pairs comprising of glimpse location, patch vector, etc. in a table. The computations are obviated during inference by utilizing the table to read out key-value pairs and performing compute-free inference by memorization. By exploiting Bayesian optimization and clustering, the necessary lookups are reduced, and accuracy is improved. We also present in-memory computing circuits to quickly look up the matching key vector to an input query. Compared to competitive compute-in-memory (CIM) approaches, MBI improves energy efficiency by almost 2.7 times than multilayer perceptions (MLP)-CIM and by almost 83 times than ResNet20-CIM for MNIST character recognition.

View on arXiv PDF

Similar