CVMar 11, 2020

GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency

arXiv:2003.05242v11 citations
Originality Incremental advance
AI Analysis

This addresses the need for better visual understanding in computer vision, though it appears incremental as it builds on existing detection frameworks.

The paper tackled the problem of detecting human-object interactions in images by proposing GID-Net, a multi-stream network with a two-stage reasoning mechanism that captures global and instance dependencies, and it outperformed state-of-the-art methods on V-COCO and HICO-DET benchmarks.

Since detecting and recognizing individual human or object are not adequate to understand the visual world, learning how humans interact with surrounding objects becomes a core technology. However, convolution operations are weak in depicting visual interactions between the instances since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human perception in observing HOIs to introduce a two-stage trainable reasoning mechanism, referred to as GID block. GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances. Furthermore, we conduct a multi-stream network called GID-Net, which is a human-object interaction detection framework consisting of a human branch, an object branch and an interaction branch. Semantic information in global-level and local-level are efficiently reasoned and aggregated in each of the branches. We have compared our proposed GID-Net with existing state-of-the-art methods on two public benchmarks, including V-COCO and HICO-DET. The results have showed that GID-Net outperforms the existing best-performing methods on both the above two benchmarks, validating its efficacy in detecting human-object interactions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes