CVAIMar 4, 2024

FreeA: Human-object Interaction Detection using Free Annotation Labels

arXiv:2403.01840v21 citationsh-index: 5
Originality Highly original
AI Analysis

This addresses the need for reducing annotation effort in HOI detection for computer vision applications, representing a novel method for a known bottleneck.

The paper tackles the problem of human-object interaction detection by proposing FreeA, a self-adaptive, language-driven method that generates latent HOI labels without manual annotation, achieving state-of-the-art performance with improvements of up to 159% mAP over weakly supervised models on benchmark datasets.

Recent human-object interaction (HOI) detection methods depend on extensively annotated image datasets, which require a significant amount of manpower. In this paper, we propose a novel self-adaptive, language-driven HOI detection method, termed FreeA. This method leverages the adaptability of the text-image model to generate latent HOI labels without requiring manual annotation. Specifically, FreeA aligns image features of human-object pairs with HOI text templates and employs a knowledge-based masking technique to decrease improbable interactions. Furthermore, FreeA implements a proposed method for matching interaction correlations to increase the probability of actions associated with a particular action, thereby improving the generated HOI labels. Experiments on two benchmark datasets showcase that FreeA achieves state-of-the-art performance among weakly supervised HOI competitors. Our proposal gets +\textbf{13.29} (\textbf{159\%$\uparrow$}) mAP and +\textbf{17.30} (\textbf{98\%$\uparrow$}) mAP than the newest ``Weakly'' supervised model, and +\textbf{7.19} (\textbf{28\%$\uparrow$}) mAP and +\textbf{14.69} (\textbf{34\%$\uparrow$}) mAP than the latest ``Weakly+'' supervised model, respectively, on HICO-DET and V-COCO datasets, more accurate in localizing and classifying the interactive actions. The source code will be made public.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes