CLMay 17, 2024

SBAAM! Eliminating Transcript Dependency in Automatic Subtitling

arXiv:2405.10741v126 citationsh-index: 34ACL
Originality Incremental advance
AI Analysis

This addresses the need for more robust and accessible subtitling tools in audiovisual content, though it is incremental as it builds on prior transcription-free work for translation and segmentation.

The paper tackled the problem of generating automatic subtitles without relying on transcripts, particularly for timestamp prediction, and achieved new state-of-the-art performance across multiple language pairs and diverse conditions.

Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration. Past attempts to automate this process rely, to varying degrees, on automatic transcripts, employed diversely for the three subtasks. In response to the acknowledged limitations associated with this reliance on transcripts, recent research has shifted towards transcription-free solutions for translation and segmentation, leaving the direct generation of timestamps as uncharted territory. To fill this gap, we introduce the first direct model capable of producing automatic subtitles, entirely eliminating any dependence on intermediate transcripts also for timestamp prediction. Experimental results, backed by manual evaluation, showcase our solution's new state-of-the-art performance across multiple language pairs and diverse conditions.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes