LGMLOct 2, 2019

AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

arXiv:1910.01740v19 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of deploying complex RNN models efficiently for applications like language processing, though it is incremental as it builds on existing compression techniques.

The paper tackles the problem of high inference cost and memory requirements of RNN models by developing AntMan, a compression method that combines structured sparsity and low-rank decomposition, achieving up to 100x computation reduction with less than 1% accuracy drop and producing models 5x smaller than state-of-the-art.

Wide adoption of complex RNN based models is hindered by their inference performance, cost and memory requirements. To address this issue, we develop AntMan, combining structured sparsity with low-rank decomposition synergistically, to reduce model computation, size and execution time of RNNs while attaining desired accuracy. AntMan extends knowledge distillation based training to learn the compressed models efficiently. Our evaluation shows that AntMan offers up to 100x computation reduction with less than 1pt accuracy drop for language and machine reading comprehension models. Our evaluation also shows that for a given accuracy target, AntMan produces 5x smaller models than the state-of-art. Lastly, we show that AntMan offers super-linear speed gains compared to theoretical speedup, demonstrating its practical value on commodity hardware.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes