CVLGMay 18

The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting

arXiv:2605.1806370.4
AI Analysis

For researchers in object counting, this work provides a scalable synthetic data generation pipeline that addresses the data bottleneck for mixed-object counting, yielding significant performance gains on real-world benchmarks.

The authors identify that current object counting models fail in mixed-object settings due to limitations in existing datasets. They introduce MixCount, a synthetic dataset generated via an automatic pipeline, and show that training on it reduces MAE by 20.14% on FSC-147 and 18.3% on PairTally.

Object counting is a foundational vision task with over a decade of dedicated research, yet state-of-the-art models still fail systematically in the mixed-object setting that dominates real-world applications such as industrial inspection and product sorting. We show that this gap is strongly driven by limitations in existing training and evaluation data: real counting datasets are prohibitively expensive to annotate and suffer from labeling noise, while existing synthetic alternatives lack diversity and realism. We address this with MixCount, a dataset and benchmark for mixed-object counting designed to target the failure modes of current counting models. To overcome the high cost of constructing and labeling such data, we develop an automatic generation pipeline that synthesizes images, fine-grained textual descriptions, and pixel-perfect counting annotations at scale, eliminating the labeling ambiguity that plagues prior datasets. Evaluating state-of-the-art counting models on MixCount exposes severe degradation in the mixed-object setting. More importantly, training these models on our synthesized data yields substantial gains on real-world benchmarks, reducing MAE by 20.14% on FSC-147 and by 18.3% on PairTally. These results establish MixCount as both a benchmark and a training dataset for fine-grained counting, and demonstrate that our pipeline, which produces effectively unlimited labeled data, helps address a long-standing bottleneck in counting models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes