CVAIOct 4, 2025

Referring Expression Comprehension for Small Objects

arXiv:2510.03701v12 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses a critical problem for autonomous driving applications by focusing on small object localization, though it is incremental as it builds on existing REC methods.

The paper tackles the challenge of localizing extremely small objects in referring expression comprehension (REC) by introducing a new dataset (SOREC) with 100,000 pairs and a method (PIZA) that improves accuracy on this dataset.

Referring expression comprehension (REC) aims to localize the target object described by a natural language expression. Recent advances in vision-language learning have led to significant performance improvements in REC tasks. However, localizing extremely small objects remains a considerable challenge despite its importance in real-world applications such as autonomous driving. To address this issue, we introduce a novel dataset and method for REC targeting small objects. First, we present the small object REC (SOREC) dataset, which consists of 100,000 pairs of referring expressions and corresponding bounding boxes for small objects in driving scenarios. Second, we propose the progressive-iterative zooming adapter (PIZA), an adapter module for parameter-efficient fine-tuning that enables models to progressively zoom in and localize small objects. In a series of experiments, we apply PIZA to GroundingDINO and demonstrate a significant improvement in accuracy on the SOREC dataset. Our dataset, codes and pre-trained models are publicly available on the project page.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes