Parsing Occluded People by Flexible Compositions
This addresses the challenge of human parsing in occluded scenarios for computer vision applications, representing an incremental advance over existing methods.
The paper tackles the problem of parsing humans under significant occlusion by modeling them with a graphical tree structure and using flexible compositions of connected subtrees, resulting in significant performance improvements on the 'We Are Family' Stickmen dataset.
This paper presents an approach to parsing humans when there is significant occlusion. We model humans using a graphical model which has a tree structure building on recent work [32, 6] and exploit the connectivity prior that, even in presence of occlusion, the visible nodes form a connected subtree of the graphical model. We call each connected subtree a flexible composition of object parts. This involves a novel method for learning occlusion cues. During inference we need to search over a mixture of different flexible models. By exploiting part sharing, we show that this inference can be done extremely efficiently requiring only twice as many computations as searching for the entire object (i.e., not modeling occlusion). We evaluate our model on the standard benchmarked "We Are Family" Stickmen dataset and obtain significant performance improvements over the best alternative algorithms.