Causal Debiasing for Visual Commonsense Reasoning
This addresses bias and generalization issues in VCR for AI systems, though it appears incremental as it builds on existing causal methods.
The paper tackles bias in Visual Commonsense Reasoning (VCR) by analyzing co-occurrence and statistical biases in datasets and introducing VCR-OOD datasets to evaluate model generalization. It proposes a debiasing method using causal graphs and backdoor adjustment, which proves effective in experiments.
Visual Commonsense Reasoning (VCR) refers to answering questions and providing explanations based on images. While existing methods achieve high prediction accuracy, they often overlook bias in datasets and lack debiasing strategies. In this paper, our analysis reveals co-occurrence and statistical biases in both textual and visual data. We introduce the VCR-OOD datasets, comprising VCR-OOD-QA and VCR-OOD-VA subsets, which are designed to evaluate the generalization capabilities of models across two modalities. Furthermore, we analyze the causal graphs and prediction shortcuts in VCR and adopt a backdoor adjustment method to remove bias. Specifically, we create a dictionary based on the set of correct answers to eliminate prediction shortcuts. Experiments demonstrate the effectiveness of our debiasing method across different datasets.