CLHCSDASJun 29, 2022

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

arXiv:2206.14589v18 citationsh-index: 35
Originality Incremental advance
AI Analysis

This provides a simple, efficient solution for SLU tasks, benefiting developers and systems needing quick deployment, though it appears incremental as it builds on existing transducer and speech-to-text methods.

The paper tackles the problem of extracting intents and entities from audio commands in Spoken Language Understanding (SLU) by embedding them into Finite State Transducers combined with a pretrained Speech-to-Text model, enabling fast, language-independent model building without additional training, and it outperforms more resource-intensive methods on benchmarks.

In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes