RepCNN: Micro-sized, Mighty Models for Wakeword Detection
This addresses the challenge of balancing accuracy and efficiency for always-on wakeword detection in resource-constrained devices, representing an incremental improvement over existing methods.
The paper tackles the problem of training small, always-on machine learning models for wakeword detection by introducing a technique that refactors a small convolutional model into a larger redundant multi-branched architecture for training and then algebraically re-parameterizes it into a single-branched form for inference. The result is RepCNN, which achieves 43% higher accuracy than a uni-branch convolutional model with the same runtime and matches the accuracy of complex architectures like BC-ResNet while reducing peak memory usage by 2x and runtime by 10x.
Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model's capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branched architecture. Then, for inference, we algebraically re-parameterize the trained model into the single-branched form with fewer parameters for a lower memory footprint and compute cost. Using this technique, we show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference. RepCNN re-parameterized models are 43% more accurate than a uni-branch convolutional model while having the same runtime. RepCNN also meets the accuracy of complex architectures like BC-ResNet, while having 2x lesser peak memory usage and 10x faster runtime.