CVJun 16, 2015

End-to-end people detection in crowded scenes

arXiv:1506.04878v3585 citations
Originality Highly original
AI Analysis

This work addresses the problem of accurate people detection in crowded environments for computer vision applications, presenting a novel approach that could improve efficiency and performance.

The paper tackles the problem of detecting people in crowded scenes by proposing an end-to-end model that decodes images directly into sets of distinct detection hypotheses, eliminating the need for post-processing like non-maximum suppression, and demonstrates its effectiveness on this challenging task.

Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes