AICLNEJun 29, 2016

Compression of Neural Machine Translation Models via Pruning

arXiv:1606.09274v1224 citations
Originality Incremental advance
AI Analysis

This work addresses storage efficiency for NMT systems, offering a practical compression method with minimal performance loss, though it is incremental as it builds on existing pruning techniques.

The paper tackles the problem of over-parameterization in Neural Machine Translation (NMT) models by applying magnitude-based pruning schemes to compress them, achieving up to 80% pruning with retraining that recovers or surpasses original performance on the WMT'14 English-German task.

Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT models, namely class-blind, class-uniform, and class-distribution, which differ in terms of how pruning thresholds are computed for the different classes of weights in the NMT architecture. We demonstrate the efficacy of weight pruning as a compression technique for a state-of-the-art NMT system. We show that an NMT model with over 200 million parameters can be pruned by 40% with very little performance loss as measured on the WMT'14 English-German translation task. This sheds light on the distribution of redundancy in the NMT architecture. Our main result is that with retraining, we can recover and even surpass the original performance with an 80%-pruned model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes