CVApr 30, 2021

RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection

arXiv:2104.15015v14 citations
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem in computer vision for improving HOI detection, offering incremental advances through novel relation reasoning modules.

The paper tackles the problem of Human-Object Interaction (HOI) detection by proposing a relation reasoning approach to inject interactive semantics, resulting in a new state-of-the-art on V-COCO and HICO-DET benchmarks with relative improvements of 5.5% and 9.8% over the baseline.

Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects. Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions. In this paper, we therefore propose novel relation reasoning for HOI detection. We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference. Upon the frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: a) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net sets a new state-of-the-art on both V-COCO and HICO-DET benchmarks and improves the baseline about 5.5% and 9.8% relatively, validating that this first effort in exploring relation reasoning and integrating interactive semantics has brought obvious improvement for end-to-end HOI detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes