Adaptive Pruning of Neural Language Models for Mobile Devices
This addresses power consumption and battery life issues for mobile software keyboards, but it is incremental as it builds on existing pruning methods.
The paper tackles the accuracy-efficiency tradeoff in neural language models for mobile devices by applying pruning techniques to quasi-recurrent neural networks, achieving a 40% energy savings over state-of-the-art with only a 17% relative increase in perplexity.
Neural language models (NLMs) exist in an accuracy-efficiency tradeoff space where better perplexity typically comes at the cost of greater computation complexity. In a software keyboard application on mobile devices, this translates into higher power consumption and shorter battery life. This paper represents the first attempt, to our knowledge, in exploring accuracy-efficiency tradeoffs for NLMs. Building on quasi-recurrent neural networks (QRNNs), we apply pruning techniques to provide a "knob" to select different operating points. In addition, we propose a simple technique to recover some perplexity using a negligible amount of memory. Our empirical evaluations consider both perplexity as well as energy consumption on a Raspberry Pi, where we demonstrate which methods provide the best perplexity-power consumption operating point. At one operating point, one of the techniques is able to provide energy savings of 40% over the state of the art with only a 17% relative increase in perplexity.