CVDec 15, 2016

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

arXiv:1612.07310v134 citations
Originality Incremental advance
AI Analysis

This work addresses the need for detailed part-level semantics in high-level vision tasks like human-object interaction and image captioning, representing an incremental advancement over existing part localization methods.

The paper tackles the problem of inferring rich semantic descriptions of object parts in images, proposing a method that tokenizes semantic space into discrete part states and achieves efficient and accurate pixel-wise annotation.

Important high-level vision tasks such as human-object interaction, image captioning and robotic manipulation require rich semantic descriptions of objects at part level. Based upon previous work on part localization, in this paper, we address the problem of inferring rich semantics imparted by an object part in still images. We propose to tokenize the semantic space as a discrete set of part states. Our modeling of part state is spatially localized, therefore, we formulate the part state inference problem as a pixel-wise annotation problem. An iterative part-state inference neural network is specifically designed for this task, which is efficient in time and accurate in performance. Extensive experiments demonstrate that the proposed method can effectively predict the semantic states of parts and simultaneously correct localization errors, thus benefiting a few visual understanding applications. The other contribution of this paper is our part state dataset which contains rich part-level semantic annotations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes