CLJul 13, 2024

Minimizing PLM-Based Few-Shot Intent Detectors

Haode Zhang, Albert Y. S. Lam, Xiao-Ming Wu

arXiv:2407.09943v21.0h-index: 7Has Code

Originality Incremental advance

AI Analysis

This work addresses deployment challenges for intent detectors on mobile devices, representing an incremental improvement in model compression for specific applications.

The paper tackled the problem of deploying large pre-trained language model-based intent detectors in resource-constrained environments by minimizing their size through data augmentation, knowledge distillation, and vocabulary pruning, achieving a 21x compression ratio with nearly identical performance on four benchmarks.

Recent research has demonstrated the feasibility of training efficient intent detectors based on pre-trained language model~(PLM) with limited labeled data. However, deploying these detectors in resource-constrained environments such as mobile devices poses challenges due to their large sizes. In this work, we aim to address this issue by exploring techniques to minimize the size of PLM-based intent detectors trained with few-shot data. Specifically, we utilize large language models (LLMs) for data augmentation, employ a cutting-edge model compression method for knowledge distillation, and devise a vocabulary pruning mechanism called V-Prune. Through these approaches, we successfully achieve a compression ratio of 21 in model memory usage, including both Transformer and the vocabulary, while maintaining almost identical performance levels on four real-world benchmarks.

View on arXiv PDF Code

Similar