CLMay 12

Towards Visually-Guided Movie Subtitle Translation for Indic Languages

arXiv:2605.1199382.8
Predicted impact top 60% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For low-resource Indic language subtitle translation, this work identifies temporal misalignment as a key challenge and proposes a practical selective grounding strategy that improves translation quality with minimal visual processing.

The paper studies visual grounding for movie subtitle translation into Indic languages, finding that selective visual enhancement of the lowest-quality 20-30% of segments improves COMET scores over text-only baselines, with coarse attribute-based summaries being more robust than free-text summaries.

Movie subtitle translation is inherently multimodal, yet text-only systems often miss visual cues needed to convey emotion, action, and social nuance, especially for low-resource Indic languages (English to Hindi, Bengali, Telugu, Tamil and Kannada). We present a case study on five full-length films and compare two lightweight visual grounding strategies: structured attribute summaries from a 5-minute sliding window and free-text summaries of inter-subtitle visual gaps. Our analysis shows that temporal misalignment between subtitles and frames is a major obstacle in long-form video, often rendering indiscriminate visual grounding ineffective. However, oracle selective grounding, which replaces only the lowest-quality 20-30\% of baseline segments with visual-enhanced outputs, consistently improves COMET over the text-only baseline while requiring far less visual processing. Among the two approaches, coarse attribute-based visual context summarization is more robust, capturing scene-level emotion and contextual subtle cues that text alone often misses

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes