LG MLOct 2, 2019

AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

Samyam Rajbhandari, Harsh Shrivastava, Yuxiong He

arXiv:1910.01740v16.69 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of deploying complex RNN models efficiently for applications like language processing, though it is incremental as it builds on existing compression techniques.

The paper tackles the problem of high inference cost and memory requirements of RNN models by developing AntMan, a compression method that combines structured sparsity and low-rank decomposition, achieving up to 100x computation reduction with less than 1% accuracy drop and producing models 5x smaller than state-of-the-art.

Wide adoption of complex RNN based models is hindered by their inference performance, cost and memory requirements. To address this issue, we develop AntMan, combining structured sparsity with low-rank decomposition synergistically, to reduce model computation, size and execution time of RNNs while attaining desired accuracy. AntMan extends knowledge distillation based training to learn the compressed models efficiently. Our evaluation shows that AntMan offers up to 100x computation reduction with less than 1pt accuracy drop for language and machine reading comprehension models. Our evaluation also shows that for a given accuracy target, AntMan produces 5x smaller models than the state-of-art. Lastly, we show that AntMan offers super-linear speed gains compared to theoretical speedup, demonstrating its practical value on commodity hardware.

View on arXiv PDF

Similar