CVOct 26, 2017

Deep Spatial Regression Model for Image Crowd Counting

arXiv:1710.09757v115 citations
Originality Incremental advance
AI Analysis

This work addresses crowd counting for computer vision applications, but it appears incremental as it builds on existing CNN and LSTM techniques without introducing a new paradigm.

The authors tackled the problem of crowd counting in images with arbitrary perspective and resolution by proposing a deep spatial regression model (DSRM) that combines CNN and LSTM to regress local counts, and they reported that their method outperforms state-of-the-art methods on several datasets.

Computer vision techniques have been used to produce accurate and generic crowd count estimators in recent years. Due to severe occlusions, appearance variations, perspective distortions and illumination conditions, crowd counting is a very challenging task. To this end, we propose a deep spatial regression model(DSRM) for counting the number of individuals present in a still image with arbitrary perspective and arbitrary resolution. Our proposed model is based on Convolutional Neural Network (CNN) and long short term memory (LSTM). First, we put the images into a pretrained CNN to extract a set of high-level features. Then the features in adjacent regions are used to regress the local counts with a LSTM structure which takes the spatial information into consideration. The final global count is obtained by a sum of the local patches. We apply our framework on several challenging crowd counting datasets, and the experiment results illustrate that our method on the crowd counting and density estimation problem outperforms state-of-the-art methods in terms of reliability and effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes