CLLGSDASAug 14, 2023

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

arXiv:2308.07395v13 citationsh-index: 69
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better de-normalization and user interaction handling in digital assistants, but it is incremental as it applies an existing text injection method to new tasks.

The study tackled the problem of improving auxiliary tasks like capitalization and turn-taking prediction in speech models by using text injection, showing that it boosts capitalization performance for long-tail data and improves turn-taking detection recall.

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate. This study examines the use of text injection for auxiliary tasks, which are the non-ASR tasks often performed by an E2E model. In this work, we use joint end-to-end and internal language model training (JEIT) as our text injection algorithm to train an ASR model which performs two auxiliary tasks. The first is capitalization, which is a de-normalization task. The second is turn-taking prediction, which attempts to identify whether a user has completed their conversation turn in a digital assistant interaction. We show results demonstrating that our text injection method boosts capitalization performance for long-tail data, and improves turn-taking detection recall.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes