CLJan 17, 2023

Curriculum Script Distillation for Multilingual Visual Question Answering

arXiv:2301.07227v1h-index: 23
Originality Incremental advance
AI Analysis

This addresses the challenge of extending VQA advancements to non-English languages, which is incremental as it builds on existing pre-trained models with a novel curriculum approach.

The paper tackled the problem of limited multilingual performance in Visual Question Answering (VQA) by introducing a curriculum based on language translations to finetune pre-trained models, resulting in performance improvements of ~6% for languages sharing the same script and ~5-12% for mixed-script code-switched languages.

Pre-trained models with dual and cross encoders have shown remarkable success in propelling the landscape of several tasks in vision and language in Visual Question Answering (VQA). However, since they are limited by the requirements of gold annotated data, most of these advancements do not see the light of day in other languages beyond English. We aim to address this problem by introducing a curriculum based on the source and target language translations to finetune the pre-trained models for the downstream task. Experimental results demonstrate that script plays a vital role in the performance of these models. Specifically, we show that target languages that share the same script perform better (~6%) than other languages and mixed-script code-switched languages perform better than their counterparts (~5-12%).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes