A Robust Semantic Frame Parsing Pipeline on a New Complex Twitter Dataset
This work addresses the challenge of handling complex, real-world data like long tweets with diverse patterns and tokens for SLU applications, though it appears incremental as it builds on existing methods with a new dataset.
The paper tackles the problem of semantic frame parsing for spoken language understanding by addressing out-of-distribution patterns and out-of-vocabulary tokens, introducing a new complex Twitter dataset and a robust pipeline that achieves much better results compared to state-of-the-art baseline models on both SNIPS and the new dataset.
Most recent semantic frame parsing systems for spoken language understanding (SLU) are designed based on recurrent neural networks. These systems display decent performance on benchmark SLU datasets such as ATIS or SNIPS, which contain short utterances with relatively simple patterns. However, the current semantic frame parsing models lack a mechanism to handle out-of-distribution (\emph{OOD}) patterns and out-of-vocabulary (\emph{OOV}) tokens. In this paper, we introduce a robust semantic frame parsing pipeline that can handle both \emph{OOD} patterns and \emph{OOV} tokens in conjunction with a new complex Twitter dataset that contains long tweets with more \emph{OOD} patterns and \emph{OOV} tokens. The new pipeline demonstrates much better results in comparison to state-of-the-art baseline SLU models on both the SNIPS dataset and the new Twitter dataset (Our new Twitter dataset can be downloaded from https://1drv.ms/u/s!AroHb-W6_OAlavK4begsDsMALfE?e=c8f2XX ). Finally, we also build an E2E application to demo the feasibility of our algorithm and show why it is useful in real application.