Deep Poselets for Human Detection
This work addresses the problem of improving human detection accuracy for computer vision applications, representing an incremental advance by integrating poselet-based features with existing CNN methods.
The paper tackles human detection in natural scenes by introducing a bootstrapping method to collect millions of weakly labeled examples for poselets, training a CNN to discriminate poselet types and extract Pose Discriminative Feature vectors, and combining these with object-level CNNs for detection, achieving state-of-the-art performance on PASCAL datasets.
We address the problem of detecting people in natural scenes using a part approach based on poselets. We propose a bootstrapping method that allows us to collect millions of weakly labeled examples for each poselet type. We use these examples to train a Convolutional Neural Net to discriminate different poselet types and separate them from the background class. We then use the trained CNN as a way to represent poselet patches with a Pose Discriminative Feature (PDF) vector -- a compact 256-dimensional feature vector that is effective at discriminating pose from appearance. We train the poselet model on top of PDF features and combine them with object-level CNNs for detection and bounding box prediction. The resulting model leads to state-of-the-art performance for human detection on the PASCAL datasets.