CVJun 16, 2015

End-to-end people detection in crowded scenes

arXiv:1506.04878v336.0585 citations

Originality Highly original

AI Analysis

This work addresses the problem of accurate people detection in crowded environments for computer vision applications, presenting a novel approach that could improve efficiency and performance.

The paper tackles the problem of detecting people in crowded scenes by proposing an end-to-end model that decodes images directly into sets of distinct detection hypotheses, eliminating the need for post-processing like non-maximum suppression, and demonstrates its effectiveness on this challenging task.

Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes.

View on arXiv PDF

Similar