A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling
This addresses the problem of improving spoken language understanding systems for applications like virtual assistants, though it is incremental as it builds on existing encoder-decoder models.
The paper tackled intent detection and slot filling in spoken language understanding by designing a Bi-model based RNN semantic frame parsing network that uses two correlated bidirectional LSTMs to jointly model the tasks, achieving state-of-the-art results with about 0.5% intent accuracy improvement and 0.9% slot filling improvement on the ATIS benchmark.
Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks . The most effective algorithms are based on the structures of sequence to sequence models (or "encoder-decoder" models), and generate the intents and semantic tags either using separate models or a joint model. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. Most of these approaches use one (joint) NN based model (including encoder-decoder structure) to model two tasks, hence may not fully take advantage of the cross-impact between them. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-the-art result on the benchmark ATIS data, with about 0.5$\%$ intent accuracy improvement and 0.9 $\%$ slot filling improvement.