CV AI LGDec 1, 2024

Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding

Zilin Du, Haoxin Li, Jianfei Yu, Boyang Li

arXiv:2412.00684v23.71 citationsh-index: 27

Originality Incremental advance

AI Analysis

This work addresses data scarcity in visual grounding, a domain-specific task, with incremental improvements in performance.

The paper tackles the problem of learning visual grounding under data-scarce settings by proposing POBF, a framework that synthesizes images and selects effective training data, achieving an average gain of 5.83% over real-data-only methods and outperforming baselines by 2.29%-3.85% in accuracy.

Visual grounding aims to localize the image regions based on a textual query. Given the difficulty of large-scale data curation, we investigate how to effectively learn visual grounding under data-scarce settings in this paper. To address the data scarcity, we propose a novel framework, POBF (Paint Outside the Box and Filter). POBF synthesizes images by inpainting outside the box, tackling a label misalignment issue encountered in previous works. Furthermore, POBF leverages an innovative filtering scheme to select the most effective training data. This scheme combines a hardness score and an overfitting score, balanced by a penalty term. Extensive experiments across four benchmark datasets demonstrate that POBF consistently improves performance, achieving an average gain of 5.83\% over the real-data-only method and outperforming leading baselines by 2.29\%-3.85\% in accuracy. Additionally, we validate the robustness and generalizability of POBF across various generative models, training data sizes, and model architectures.

View on arXiv PDF

Similar