From Superpixel to Human Shape Modelling for Carried Object Detection
This addresses the need for systems to reason about human-object interactions, but it appears incremental as it builds on existing superpixel and feature matching techniques.
The paper tackles the problem of detecting carried objects in single video frames by using multi-scale superpixel segmentation and matching against learned human-like features, achieving competitive or better results than state-of-the-art methods on two challenging datasets.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.