CVMay 23, 2021

Rethinking Global Context in Crowd Counting

arXiv:2105.10926v227 citations
Originality Incremental advance
AI Analysis

This addresses crowd counting accuracy for applications like surveillance and public safety, representing an incremental improvement over existing transformer-based methods.

The paper tackles crowd counting by using a pure transformer with a context token and token-attention module to extract global context, achieving significant performance improvements on multiple datasets.

This paper investigates the role of global context for crowd counting. Specifically, a pure transformer is used to extract features with global information from overlapping image patches. Inspired by classification, we add a context token to the input sequence, to facilitate information exchange with tokens corresponding to image patches throughout transformer layers. Due to the fact that transformers do not explicitly model the tried-and-true channel-wise interactions, we propose a token-attention module (TAM) to recalibrate encoded features through channel-wise attention informed by the context token. Beyond that, it is adopted to predict the total person count of the image through regression-token module (RTM). Extensive experiments on various datasets, including ShanghaiTech, UCF-QNRF, JHU-CROWD++ and NWPU, demonstrate that the proposed context extraction techniques can significantly improve the performance over the baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes