CVAILGDec 1, 2024

Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding

arXiv:2412.00684v21 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses data scarcity in visual grounding, a domain-specific task, with incremental improvements in performance.

The paper tackles the problem of learning visual grounding under data-scarce settings by proposing POBF, a framework that synthesizes images and selects effective training data, achieving an average gain of 5.83% over real-data-only methods and outperforming baselines by 2.29%-3.85% in accuracy.

Visual grounding aims to localize the image regions based on a textual query. Given the difficulty of large-scale data curation, we investigate how to effectively learn visual grounding under data-scarce settings in this paper. To address the data scarcity, we propose a novel framework, POBF (Paint Outside the Box and Filter). POBF synthesizes images by inpainting outside the box, tackling a label misalignment issue encountered in previous works. Furthermore, POBF leverages an innovative filtering scheme to select the most effective training data. This scheme combines a hardness score and an overfitting score, balanced by a penalty term. Extensive experiments across four benchmark datasets demonstrate that POBF consistently improves performance, achieving an average gain of 5.83\% over the real-data-only method and outperforming leading baselines by 2.29\%-3.85\% in accuracy. Additionally, we validate the robustness and generalizability of POBF across various generative models, training data sizes, and model architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes