CVAILGApr 22, 2025

RePOPE: Impact of Annotation Errors on the POPE Benchmark

arXiv:2504.15707v16 citationsh-index: 3Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses data quality issues in a widely used benchmark for researchers in computer vision and AI, though it is incremental as it focuses on error correction rather than new methods.

The study investigated how annotation errors in the MSCOCO dataset affect the POPE benchmark for object hallucination, finding that re-annotating images led to significant changes in model rankings, with shifts of up to 20% in performance metrics.

Since data annotation is costly, benchmark datasets often incorporate labels from established image datasets. In this work, we assess the impact of label errors in MSCOCO on the frequently used object hallucination benchmark POPE. We re-annotate the benchmark images and identify an imbalance in annotation errors across different subsets. Evaluating multiple models on the revised labels, which we denote as RePOPE, we observe notable shifts in model rankings, highlighting the impact of label quality. Code and data are available at https://github.com/YanNeu/RePOPE .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes