ASCLMMSDIVNov 1, 2022

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

arXiv:2211.01338v15 citationsh-index: 34
Originality Incremental advance
AI Analysis

This addresses the problem of making educational content accessible in multiple languages for learners in India, though it appears incremental as it builds on existing dubbing technologies.

The paper tackles the challenge of cross-lingual dubbing of lecture videos by developing a semi-automatic pipeline for regenerating English lectures into multiple Indian languages, achieving MOS scores of 4.09 and 3.74 for Hindi and Tamil with a 75% reduction in human effort.

Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes