RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition
This work addresses the challenge of efficient RNN deployment for speech recognition on mobile devices, offering a practical solution with significant performance improvements.
The authors tackled the problem of accelerating RNN inference for speech recognition on mobile devices by proposing RTMobile, which uses block-based pruning and compiler optimizations to achieve real-time performance with improved accuracy and energy efficiency, such as a 40x gain over prior FPGA work.
Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a novel block-based pruning approach and compiler optimizations to accelerate RNN inference on mobile devices. Our proposed RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Experimental results demonstrate that RTMobile can significantly outperform existing RNN hardware acceleration methods in terms of inference accuracy and time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$\times$ while maintaining the same inference time.