AS CLMar 20, 2020

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

arXiv:2003.09180v11.2

Originality Incremental advance

AI Analysis

This addresses a domain-specific problem for audio-text synchronization in media production, but it is incremental as it builds on existing mismatch detection methods.

The study tackled the problem of detecting mismatches between text scripts and voice-overs by proposing a novel utterance verification method based on phoneme recognition ranking, which outperformed a state-of-the-art cross-modal attention approach in experiments.

The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change. The proposed method, therefore, uses the recognition ranking of each phoneme segment corresponding to a phoneme sequence for measuring the confidence of a voice-over utterance for its corresponding script. The experimental results show that the proposed UV method outperforms a state-of-the-art approach using cross modal attention used for detecting mismatch between speech and transcription.

View on arXiv PDF

Similar