CLJul 13, 2024

Minimizing PLM-Based Few-Shot Intent Detectors

arXiv:2407.09943v2h-index: 7
Originality Incremental advance
AI Analysis

This work addresses deployment challenges for intent detectors on mobile devices, representing an incremental improvement in model compression for specific applications.

The paper tackled the problem of deploying large pre-trained language model-based intent detectors in resource-constrained environments by minimizing their size through data augmentation, knowledge distillation, and vocabulary pruning, achieving a 21x compression ratio with nearly identical performance on four benchmarks.

Recent research has demonstrated the feasibility of training efficient intent detectors based on pre-trained language model~(PLM) with limited labeled data. However, deploying these detectors in resource-constrained environments such as mobile devices poses challenges due to their large sizes. In this work, we aim to address this issue by exploring techniques to minimize the size of PLM-based intent detectors trained with few-shot data. Specifically, we utilize large language models (LLMs) for data augmentation, employ a cutting-edge model compression method for knowledge distillation, and devise a vocabulary pruning mechanism called V-Prune. Through these approaches, we successfully achieve a compression ratio of 21 in model memory usage, including both Transformer and the vocabulary, while maintaining almost identical performance levels on four real-world benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes