CVLGDec 1, 2020

Counting People by Estimating People Flows

arXiv:2012.00452v242 citations
AI Analysis

This work addresses the problem of accurately counting people in crowded video scenes for applications like surveillance and crowd management, offering a more robust and annotation-efficient solution.

This paper proposes a method for counting people in crowded scenes by estimating people flows between consecutive video frames, rather than directly regressing densities. This approach leverages conservation constraints to significantly improve performance, and further benefits from correlating people flow with optical flow. Additionally, it enables effective training with fewer annotations in an active learning setting.

Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes