CLMar 28, 2019

Modeling Acoustic-Prosodic Cues for Word Importance Prediction in Spoken Dialogues

Sushant Kafle, Cecilia O. Alm, Matt Huenerfauth

arXiv:1903.12238v231.01090 citations

Originality Incremental advance

AI Analysis

This work addresses the need for more accurate real-time captions for Deaf and Hard of Hearing individuals by focusing on important words, though it is incremental as it builds on existing word-importance prediction methods.

The paper tackled the problem of predicting word importance in spoken dialogues using acoustic-prosodic cues, achieving competitive performance against state-of-the-art text-based models, with particular benefits on imperfect ASR output.

Prosodic cues in conversational speech aid listeners in discerning a message. We investigate whether acoustic cues in spoken dialogue can be used to identify the importance of individual words to the meaning of a conversation turn. Individuals who are Deaf and Hard of Hearing often rely on real-time captions in live meetings. Word error rate, a traditional metric for evaluating automatic speech recognition, fails to capture that some words are more important for a system to transcribe correctly than others. We present and evaluate neural architectures that use acoustic features for 3-class word importance prediction. Our model performs competitively against state-of-the-art text-based word-importance prediction models, and it demonstrates particular benefits when operating on imperfect ASR output.

View on arXiv PDF

Similar