CVMar 31

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

arXiv:2604.0026793.11 citations
Predicted impact top 11% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of developing AI assistants that can perceive and respond to human interactions in realistic scenarios, though it appears incremental as it builds on existing datasets and methods.

The authors tackled the problem of comprehensive social interaction understanding from raw audio, vision, and speech input by introducing the Omni-MMSI task and proposing Omni-MMSI-R, a reference-guided pipeline that outperforms advanced LLMs and counterparts on this task.

We introduce Omni-MMSI, a new task that requires comprehensive social interaction understanding from raw audio, vision, and speech input. The task involves perceiving identity-attributed social cues (e.g., who is speaking what) and reasoning about the social interaction (e.g., whom the speaker refers to). This task is essential for developing AI assistants that can perceive and respond to human interactions. Unlike prior studies that operate on oracle-preprocessed social cues, Omni-MMSI reflects realistic scenarios where AI assistants must perceive and reason from raw data. However, existing pipelines and multi-modal LLMs perform poorly on Omni-MMSI because they lack reliable identity attribution capabilities, which leads to inaccurate social interaction understanding. To address this challenge, we propose Omni-MMSI-R, a reference-guided pipeline that produces identity-attributed social cues with tools and conducts chain-of-thought social reasoning. To facilitate this pipeline, we construct participant-level reference pairs and curate reasoning annotations on top of the existing datasets. Experiments demonstrate that Omni-MMSI-R outperforms advanced LLMs and counterparts on Omni-MMSI. Project page: https://sampson-lee.github.io/omni-mmsi-project-page.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes