CLMay 9, 2025

TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for Documentaries

Jinze Lv, Jian Chen, Zi Long, Xianghua Fu, Yin Chen

arXiv:2505.05714v14.91 citationsh-index: 2Has CodeNLDB

Originality Synthesis-oriented

AI Analysis

This addresses the problem of limited video data for real-world MMT tasks like documentary translation, but it is incremental as it builds on existing MMT methods with a new dataset.

The authors tackled the lack of diverse video data for multimodal machine translation (MMT) by creating TopicVD, a topic-based dataset of documentary video-subtitle pairs, which improved translation performance with visual information and global context, though performance declined in out-of-domain scenarios.

Most existing multimodal machine translation (MMT) datasets are predominantly composed of static images or short video clips, lacking extensive video data across diverse domains and topics. As a result, they fail to meet the demands of real-world MMT tasks, such as documentary translation. In this study, we developed TopicVD, a topic-based dataset for video-supported multimodal machine translation of documentaries, aiming to advance research in this field. We collected video-subtitle pairs from documentaries and categorized them into eight topics, such as economy and nature, to facilitate research on domain adaptation in video-guided MMT. Additionally, we preserved their contextual information to support research on leveraging the global context of documentaries in video-guided MMT. To better capture the shared semantics between text and video, we propose an MMT model based on a cross-modal bidirectional attention module. Extensive experiments on the TopicVD dataset demonstrate that visual information consistently improves the performance of the NMT model in documentary translation. However, the MMT model's performance significantly declines in out-of-domain scenarios, highlighting the need for effective domain adaptation methods. Additionally, experiments demonstrate that global context can effectively improve translation performance. % Dataset and our implementations are available at https://github.com/JinzeLv/TopicVD

View on arXiv PDF Code

Similar