Counting Everyday Objects in Everyday Scenes
This work addresses the challenge of object counting in everyday scenes, which is incremental as it builds on existing counting and detection methods.
The paper tackles the problem of counting object instances in natural images by developing dedicated models that handle large variance in counts, appearances, and scales, showing consistent improvements over baselines on PASCAL VOC 2007 and COCO datasets. It also explores applications to improve object detection and visual question answering for 'how many?' questions.
We are interested in counting the number of instances of object classes in natural, everyday images. Previous counting approaches tackle the problem in restricted domains such as counting pedestrians in surveillance videos. Counts can also be estimated from outputs of other vision tasks like object detection. In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes. Our approach is inspired by the phenomenon of subitizing - the ability of humans to make quick assessments of counts given a perceptual signal, for small count values. Given a natural scene, we employ a divide and conquer strategy while incorporating context across the scene to adapt the subitizing idea to counting. Our approach offers consistent improvements over numerous baseline approaches for counting on the PASCAL VOC 2007 and COCO datasets. Subsequently, we study how counting can be used to improve object detection. We then show a proof of concept application of our counting methods to the task of Visual Question Answering, by studying the `how many?' questions in the VQA and COCO-QA datasets.