CVNov 14, 2018

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

Tanmay Gupta, Alexander Schwing, Derek Hoiem

arXiv:1811.05967v222.749 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of detecting interactions between humans and objects in images, which is crucial for applications like robotics and surveillance, but it is incremental as it builds on existing pre-trained detectors and training techniques.

The paper tackled human-object interaction detection by proposing a simple factorized model with appearance and layout encodings, which outperformed more complex methods, achieving state-of-the-art results on the HICO-Det dataset.

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (1) eliminating a train-inference mismatch; (2) rejecting easy negatives during mini-batch training; and (3) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset.

View on arXiv PDF

Similar