CVDec 11, 2024

Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection

arXiv:2412.08506v12 citationsh-index: 2AAAI
Originality Incremental advance
AI Analysis

This addresses a specific challenge in human-object interaction detection for computer vision applications, representing an incremental improvement over existing transformer-based methods.

The paper tackles the problem of accurately identifying uncommon visual patterns and distinguishing ambiguous human-object interactions in HOI detection by introducing Interaction Prompt Distribution Learning (InterProDa), which learns multiple soft prompts and incorporates category distributions into queries, achieving competitive performance on HICO-DET and vcoco benchmarks.

Human-object interaction (HOI) detectors with popular query-transformer architecture have achieved promising performance. However, accurately identifying uncommon visual patterns and distinguishing between ambiguous HOIs continue to be difficult for them. We observe that these difficulties may arise from the limited capacity of traditional detector queries in representing diverse intra-category patterns and inter-category dependencies. To address this, we introduce the Interaction Prompt Distribution Learning (InterProDa) approach. InterProDa learns multiple sets of soft prompts and estimates category distributions from various prompts. It then incorporates HOI queries with category distributions, making them capable of representing near-infinite intra-category dynamics and universal cross-category relationships. Our InterProDa detector demonstrates competitive performance on HICO-DET and vcoco benchmarks. Additionally, our method can be integrated into most transformer-based HOI detectors, significantly enhancing their performance with minimal additional parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes