Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models
This work addresses data scarcity for SLU systems, but it is incremental as it builds on existing pretrained models and data augmentation techniques.
The paper tackles data scarcity in spoken language understanding (SLU) by proposing data augmentation methods for slot filling, using fine-tuned pretrained language models to generate diverse sentences, which significantly improves performance on two public datasets.
Spoken Language Understanding (SLU) is one essential step in building a dialogue system. Due to the expensive cost of obtaining the labeled data, SLU suffers from the data scarcity problem. Therefore, in this paper, we focus on data augmentation for slot filling task in SLU. To achieve that, we aim at generating more diverse data based on existing data. Specifically, we try to exploit the latent language knowledge from pretrained language models by finetuning them. We propose two strategies for finetuning process: value-based and context-based augmentation. Experimental results on two public SLU datasets have shown that compared with existing data augmentation methods, our proposed method can generate more diverse sentences and significantly improve the performance on SLU.