CL AI CVAug 30, 2023

Impact of Visual Context on Noisy Multimodal NMT: An Empirical Study for English to Indian Languages

Baban Gain, Dibyanayan Bandyopadhyay, Samrat Mukherjee, Chandranath Adak, Asif Ekbal

arXiv:2308.16075v21.35 citationsh-index: 53Has Code

Originality Incremental advance

AI Analysis

This provides empirical insights for researchers working on multimodal translation, particularly in noisy environments, though it appears incremental as it builds on existing multimodal NMT approaches.

This study investigated how adding visual context affects neural machine translation from English to Indian languages in noisy settings, finding that multimodal models slightly outperform text-only models when noise is present, with specific image feature types performing best at different noise levels.

Neural Machine Translation (NMT) has made remarkable progress using large-scale textual data, but the potential of incorporating multimodal inputs, especially visual information, remains underexplored in high-resource settings. While prior research has focused on using multimodal data in low-resource scenarios, this study examines how image features impact translation when added to a large-scale, pre-trained unimodal NMT system. Surprisingly, the study finds that images might be redundant in this context. Additionally, the research introduces synthetic noise to assess whether images help the model handle textual noise. Multimodal models slightly outperform text-only models in noisy settings, even when random images are used. The study's experiments translate from English to Hindi, Bengali, and Malayalam, significantly outperforming state-of-the-art benchmarks. Interestingly, the effect of visual context varies with the level of source text noise: no visual context works best for non-noisy translations, cropped image features are optimal for low noise, and full image features perform better in high-noise scenarios. This sheds light on the role of visual context, especially in noisy settings, and opens up a new research direction for Noisy Neural Machine Translation in multimodal setups. The research emphasizes the importance of combining visual and textual information to improve translation across various environments. Our code is publicly available at https://github.com/babangain/indicMMT.

View on arXiv PDF Code

Similar