CVJan 24, 2022

Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding

arXiv:2201.09639v12.63 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of diagnosing domain mismatch in VQA models for researchers, though it appears incremental as it builds on existing cross-dataset adaptation methods.

The paper tackles the problem of evaluating cross-dataset adaptation in visual question answering (VQA) by developing a VQG module to automatically generate out-of-distribution shifts, enabling systematic assessment of model capabilities.

Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image. Through cross-dataset adaptation methods, it is possible to transfer knowledge from a source dataset with larger train samples to a target dataset where training set is limited. Suppose a VQA model trained on one dataset train set fails in adapting to another, it is hard to identify the underlying cause of domain mismatch as there could exists a multitude of reasons such as image distribution mismatch and question distribution mismatch. At UCLA, we are working on a VQG module that facilitate in automatically generating OOD shifts that aid in systematically evaluating cross-dataset adaptation capabilities of VQA models.

View on arXiv PDF

Similar