ASCLMar 20, 2020

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

arXiv:2003.09180v1
AI Analysis

This addresses a domain-specific problem for audio-text synchronization in media production, but it is incremental as it builds on existing mismatch detection methods.

The study tackled the problem of detecting mismatches between text scripts and voice-overs by proposing a novel utterance verification method based on phoneme recognition ranking, which outperformed a state-of-the-art cross-modal attention approach in experiments.

The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change. The proposed method, therefore, uses the recognition ranking of each phoneme segment corresponding to a phoneme sequence for measuring the confidence of a voice-over utterance for its corresponding script. The experimental results show that the proposed UV method outperforms a state-of-the-art approach using cross modal attention used for detecting mismatch between speech and transcription.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes