Recent Advances in End-to-End Spoken Language Understanding
This work addresses spoken language understanding for applications like voice assistants, but it appears incremental as it builds on existing end-to-end methods with specific optimizations.
The paper tackles the problem of extracting semantic information directly from speech for spoken language understanding tasks, specifically named entity recognition and semantic slot filling, by exploring techniques like speaker adaptation, CTC modification, and sequential pretraining to improve model performance, though no concrete numbers are provided.
This work investigates spoken language understanding (SLU) systems in the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. Two SLU tasks are considered: named entity recognition (NER) and semantic slot filling (SF). For these tasks, in order to improve the model performance, we explore various techniques including speaker adaptation, a modification of the connectionist temporal classification (CTC) training criterion, and sequential pretraining.