CVAILGROApr 9, 2020

Spatial Priming for Detecting Human-Object Interactions

arXiv:2004.04851v14 citations
AI Analysis

This work addresses the challenge of accurately identifying interactions between humans and objects in computer vision, representing an incremental advance over prior methods.

The paper tackles the problem of detecting human-object interactions in images by using spatial layout as a priming cue, achieving a 2.8% absolute improvement in mAP to 24.79% on the HICO-Det dataset.

The relative spatial layout of a human and an object is an important cue for determining how they interact. However, until now, spatial layout has been used just as side-information for detecting human-object interactions (HOIs). In this paper, we present a method for exploiting this spatial layout information for detecting HOIs in images. The proposed method consists of a layout module which primes a visual module to predict the type of interaction between a human and an object. The visual and layout modules share information through lateral connections at several stages. The model uses predictions from the layout module as a prior to the visual module and the prediction from the visual module is given as the final output. It also incorporates semantic information about the object using word2vec vectors. The proposed model reaches an mAP of 24.79% for HICO-Det dataset which is about 2.8% absolute points higher than the current state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes