CVNov 14, 2018

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

arXiv:1811.05967v249 citations
AI Analysis

This work addresses the problem of detecting interactions between humans and objects in images, which is crucial for applications like robotics and surveillance, but it is incremental as it builds on existing pre-trained detectors and training techniques.

The paper tackled human-object interaction detection by proposing a simple factorized model with appearance and layout encodings, which outperformed more complex methods, achieving state-of-the-art results on the HICO-Det dataset.

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (1) eliminating a train-inference mismatch; (2) rejecting easy negatives during mini-batch training; and (3) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes