TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML On-device Learning
This work addresses the challenge of enabling efficient fine-tuning of neural networks on low-power micro-controller units, which is incremental as it improves upon existing static sparse backpropagation approaches.
The paper tackles the problem of inefficient on-device learning for tiny embedded devices by introducing TinyProp, a sparse backpropagation method that dynamically adapts the backpropagation ratio during training, resulting in 5 times faster training with an average 1% accuracy loss compared to non-sparse methods and 2.9 times faster with 6% reduced accuracy loss compared to static sparse methods.
Training deep neural networks using backpropagation is very memory and computationally intensive. This makes it difficult to run on-device learning or fine-tune neural networks on tiny, embedded devices such as low-power micro-controller units (MCUs). Sparse backpropagation algorithms try to reduce the computational load of on-device learning by training only a subset of the weights and biases. Existing approaches use a static number of weights to train. A poor choice of this so-called backpropagation ratio limits either the computational gain or can lead to severe accuracy losses. In this paper we present TinyProp, the first sparse backpropagation method that dynamically adapts the back-propagation ratio during on-device training for each training step. TinyProp induces a small calculation overhead to sort the elements of the gradient, which does not significantly impact the computational gains. TinyProp works particularly well on fine-tuning trained networks on MCUs, which is a typical use case for embedded applications. For typical datasets from three datasets MNIST, DCASE2020 and CIFAR10, we are 5 times faster compared to non-sparse training with an accuracy loss of on average 1%. On average, TinyProp is 2.9 times faster than existing, static sparse backpropagation algorithms and the accuracy loss is reduced on average by 6 % compared to a typical static setting of the back-propagation ratio.