CLSDASOct 11, 2022

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

arXiv:2210.05291v110 citationsh-index: 31
Originality Synthesis-oriented
AI Analysis

This work addresses spoken language understanding, potentially benefiting multilingual speech applications, though it appears incremental as it builds on existing models.

The paper tackles the problem of improving spoken language understanding (SLU) by using semantically-aligned speech representations, showing that the SAMU-XLSR model significantly boosts performance over the baseline XLS-R model in end-to-end SLU and enhances language portability.

In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ the recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures the semantics at the utterance level, semantically aligned across different languages. This model combines the acoustic frame-level speech representation learning model (XLS-R) with the Language Agnostic BERT Sentence Embedding (LaBSE) model. We show that the use of the SAMU-XLSR model instead of the initial XLS-R model improves significantly the performance in the framework of end-to-end SLU. Finally, we present the benefits of using this model towards language portability in SLU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes