CLAILGNov 30, 2020

Extreme Model Compression for On-device Natural Language Understanding

arXiv:2012.00124v1991 citations
AI Analysis

This work provides a method for significantly reducing the size of NLU models, enabling their deployment on resource-constrained devices for commercial NLU systems.

This paper addresses the challenge of deploying large Natural Language Understanding (NLU) models on resource-constrained devices by proposing a task-aware, end-to-end compression approach. It achieves a 97.4% compression rate with less than 3.7% degradation in predictive performance on a large-scale commercial NLU system.

In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes