SIMCO: SIMilarity-based object COunting
This is an incremental advancement for computer vision researchers and practitioners, as it introduces the first unsupervised multi-class object counting method.
SIMCO tackles multi-class object counting by detecting foreground objects using a Mask RCNN trained on a synthetic dataset and clustering embeddings from a similarity-based head, achieving state-of-the-art scores on benchmarks.
We present SIMCO, the first agnostic multi-class object counting approach. SIMCO starts by detecting foreground objects through a novel Mask RCNN-based architecture trained beforehand (just once) on a brand-new synthetic 2D shape dataset, InShape; the idea is to highlight every object resembling a primitive 2D shape (circle, square, rectangle, etc.). Each object detected is described by a low-dimensional embedding, obtained from a novel similarity-based head branch; this latter implements a triplet loss, encouraging similar objects (same 2D shape + color and scale) to map close. Subsequently, SIMCO uses this embedding for clustering, so that different types of objects can emerge and be counted, making SIMCO the very first multi-class unsupervised counter. Experiments show that SIMCO provides state-of-the-art scores on counting benchmarks and that it can also help in many challenging image understanding tasks.