CVAIMar 15, 2022

CrowdMLP: Weakly-Supervised Crowd Counting via Multi-Granularity MLP

arXiv:2203.08219v144 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses the burden of acquiring location-level annotations for crowd counting, making it more accessible for applications like surveillance and event management, though it is incremental as it builds on existing weakly-supervised methods.

The paper tackles the problem of crowd counting with only count-level annotations, which are easier to acquire than location-level ones, by proposing CrowdMLP, a method that uses a multi-granularity MLP regressor to model global dependencies and regress total counts, achieving performance on par with state-of-the-art location-level supervised approaches.

Existing state-of-the-art crowd counting algorithms rely excessively on location-level annotations, which are burdensome to acquire. When only count-level (weak) supervisory signals are available, it is arduous and error-prone to regress total counts due to the lack of explicit spatial constraints. To address this issue, a novel and efficient counter (referred to as CrowdMLP) is presented, which probes into modelling global dependencies of embeddings and regressing total counts by devising a multi-granularity MLP regressor. In specific, a locally-focused pre-trained frontend is cascaded to extract crude feature maps with intrinsic spatial cues, which prevent the model from collapsing into trivial outcomes. The crude embeddings, along with raw crowd scenes, are tokenized at different granularity levels. The multi-granularity MLP then proceeds to mix tokens at the dimensions of cardinality, channel, and spatial for mining global information. An effective proxy task, namely Split-Counting, is also proposed to evade the barrier of limited samples and the shortage of spatial hints in a self-supervised manner. Extensive experiments demonstrate that CrowdMLP significantly outperforms existing weakly-supervised counting algorithms and performs on par with state-of-the-art location-level supervised approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes