CLJun 23, 2020

Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020

Tosho Hirasawa, Zhishen Yang, Mamoru Komachi, Naoaki Okazaki

arXiv:2006.12799v10.813 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of generating high-quality translations by integrating video and text for multimodal machine translation, though it appears incremental as it builds on existing challenge frameworks.

The authors tackled the Video-guided Machine Translation Challenge 2020 by developing a system that uses keyframe segmentation and positional encoding for video features, achieving a corpus-level BLEU-4 score of 36.60 and securing first place in the challenge.

Video-guided machine translation as one of multimodal neural machine translation tasks targeting on generating high-quality text translation by tangibly engaging both video and text. In this work, we presented our video-guided machine translation system in approaching the Video-guided Machine Translation Challenge 2020. This system employs keyframe-based video feature extractions along with the video feature positional encoding. In the evaluation phase, our system scored 36.60 corpus-level BLEU-4 and achieved the 1st place on the Video-guided Machine Translation Challenge 2020.

View on arXiv PDF

Similar