RePOPE: Impact of Annotation Errors on the POPE Benchmark
This work addresses data quality issues in a widely used benchmark for researchers in computer vision and AI, though it is incremental as it focuses on error correction rather than new methods.
The study investigated how annotation errors in the MSCOCO dataset affect the POPE benchmark for object hallucination, finding that re-annotating images led to significant changes in model rankings, with shifts of up to 20% in performance metrics.
Since data annotation is costly, benchmark datasets often incorporate labels from established image datasets. In this work, we assess the impact of label errors in MSCOCO on the frequently used object hallucination benchmark POPE. We re-annotate the benchmark images and identify an imbalance in annotation errors across different subsets. Evaluating multiple models on the revised labels, which we denote as RePOPE, we observe notable shifts in model rankings, highlighting the impact of label quality. Code and data are available at https://github.com/YanNeu/RePOPE .