CVCLOct 14, 2021

Semantically Distributed Robust Optimization for Vision-and-Language Inference

arXiv:2110.07165v2643 citations
Originality Incremental advance
AI Analysis

This addresses robustness issues in vision-and-language inference for AI applications, but it is incremental as it builds on existing data augmentation and robust optimization techniques.

The paper tackles the brittleness of vision-and-language models under linguistic variations by proposing SDRO, a model-agnostic method using distributed robust optimization and ensembling, resulting in performance improvements and enhanced robustness on datasets like NLVR² and VIOLIN.

Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms. While data augmentation techniques have been designed to mitigate against these failure modes, methods that can integrate this knowledge into the training pipeline remain under-explored. In this paper, we present \textbf{SDRO}, a model-agnostic method that utilizes a set linguistic transformations in a distributed robust optimization setting, along with an ensembling technique to leverage these transformations during inference. Experiments on benchmark datasets with images (NLVR$^2$) and video (VIOLIN) demonstrate performance improvements as well as robustness to adversarial attacks. Experiments on binary VQA explore the generalizability of this method to other V\&L tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes