CLLGSDASJan 28, 2022

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

arXiv:2201.12105v1
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in SLU systems for applications requiring semantic understanding from speech, but it is incremental as it builds on existing E2E models.

The paper tackled the problem of improving end-to-end spoken language understanding models for set prediction when entity spoken order is unknown, by proposing a data augmentation technique and implicit attention-based alignment method, resulting in F1 score increases of over 11% for RNN-T and about 2% for attention-based encoder-decoder models.

The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts. Advances in end-to-end (E2E) speech modeling have made it possible to train solely on semantic entities, which are far cheaper to collect than verbatim transcripts. We focus on this set prediction problem, where entity order is unspecified. Using two classes of E2E models, RNN transducers and attention based encoder-decoders, we show that these models work best when the training entity sequence is arranged in spoken order. To improve E2E SLU models when entity spoken order is unknown, we propose a novel data augmentation technique along with an implicit attention based alignment method to infer the spoken order. F1 scores significantly increased by more than 11% for RNN-T and about 2% for attention based encoder-decoder SLU models, outperforming previously reported results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes