CVCLHCLGSep 13, 2023

VLSlice: Interactive Vision-and-Language Slice Discovery

arXiv:2309.06703v111 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the challenge of bias analysis in vision-and-language models for researchers and practitioners by providing a tool to circumvent annotation bottlenecks, though it is incremental as it builds on prior subgroup discovery methods.

The paper tackles the problem of analyzing undesirable behaviors in vision-and-language models by introducing VLSlice, an interactive system that allows users to discover coherent subgroups from unlabeled image sets without extensive annotation, as demonstrated in a user study with 22 participants.

Recent work in vision-and-language demonstrates that large-scale pretraining can learn generalizable models that are efficiently transferable to downstream tasks. While this may improve dataset-scale aggregate metrics, analyzing performance around hand-crafted subgroups targeting specific bias dimensions reveals systemic undesirable behaviors. However, this subgroup analysis is frequently stalled by annotation efforts, which require extensive time and resources to collect the necessary data. Prior art attempts to automatically discover subgroups to circumvent these constraints but typically leverages model behavior on existing task-specific annotations and rapidly degrades on more complex inputs beyond "tabular" data, none of which study vision-and-language models. This paper presents VLSlice, an interactive system enabling user-guided discovery of coherent representation-level subgroups with consistent visiolinguistic behavior, denoted as vision-and-language slices, from unlabeled image sets. We show that VLSlice enables users to quickly generate diverse high-coherency slices in a user study (n=22) and release the tool publicly.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes