CVJun 23, 2021

Region-Aware Network: Model Human's Top-Down Visual Perception Mechanism for Crowd Counting

arXiv:2106.12163v221 citations
Originality Incremental advance
AI Analysis

This addresses crowd counting for applications like surveillance and public safety, but appears incremental as it builds on existing feedback and attention mechanisms.

The paper tackles the problems of background noise and scale variation in crowd counting by proposing RANet, a feedback network with a Region-Aware block that models human top-down visual perception. The method outperforms state-of-the-art approaches on several public datasets.

Background noise and scale variation are common problems that have been long recognized in crowd counting. Humans glance at a crowd image and instantly know the approximate number of human and where they are through attention the crowd regions and the congestion degree of crowd regions with a global receptive field. Hence, in this paper, we propose a novel feedback network with Region-Aware block called RANet by modeling humans Top-Down visual perception mechanism. Firstly, we introduce a feedback architecture to generate priority maps that provide prior about candidate crowd regions in input images. The prior enables the RANet pay more attention to crowd regions. Then we design Region-Aware block that could adaptively encode the contextual information into input images through global receptive field. More specifically, we scan the whole input images and its priority maps in the form of column vector to obtain a relevance matrix estimating their similarity. The relevance matrix obtained would be utilized to build global relationships between pixels. Our method outperforms state-of-the-art crowd counting methods on several public datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes