Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding
This work addresses researchers in computer vision by providing insights into where to focus efforts for improving scene understanding models, though it is incremental as it builds on existing CRF frameworks.
The paper tackled the problem of identifying which tasks in holistic scene understanding have the most potential for improvement by comparing hybrid human-machine conditional random field models, finding that certain tasks like contextual reasoning show significant 'head room' for gains.
Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers. In this work, we are interested in understanding the roles of these different tasks in improved scene understanding, in particular semantic segmentation, object detection and scene recognition. Towards this goal, we "plug-in" human subjects for each of the various components in a state-of-the-art conditional random field model. Comparisons among various hybrid human-machine CRFs give us indications of how much "head room" there is to improve scene understanding by focusing research efforts on various individual tasks.