CVApr 15

Towards Unconstrained Human-Object Interaction

arXiv:2604.1406930.8h-index: 13Has Code
Predicted impact top 17% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For computer vision researchers, this work opens a new paradigm for HOI detection that is more flexible and applicable to dynamic environments, though it is an initial exploration without strong quantitative results.

The paper introduces Unconstrained HOI (U-HOI), a new task that removes the need for a predefined interaction vocabulary, and evaluates MLLMs on it, finding that current HOI detectors are limited while MLLMs show promise for this setting.

Human-Object Interaction (HOI) detection is a longstanding computer vision problem concerned with predicting the interaction between humans and objects. Current HOI models rely on a vocabulary of interactions at training and inference time, limiting their applicability to static environments. With the advent of Multimodal Large Language Models (MLLMs), it has become feasible to explore more flexible paradigms for interaction recognition. In this work, we revisit HOI detection through the lens of MLLMs and apply them to in-the-wild HOI detection. We define the Unconstrained HOI (U-HOI) task, a novel HOI domain that removes the requirement for a predefined list of interactions at both training and inference. We evaluate a range of MLLMs on this setting and introduce a pipeline that includes test-time inference and language-to-graph conversion to extract structured interactions from free-form text. Our findings highlight the limitations of current HOI detectors and the value of MLLMs for U-HOI. Code will be available at https://github.com/francescotonini/anyhoi

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes