CVApr 14

Pi-HOC: Pairwise 3D Human-Object Contact Estimation

arXiv:2604.1292354.6h-index: 2
AI Analysis

This work addresses the bottleneck of fine-grained contact estimation in multi-human interactions, enabling efficient and accurate semantic contact prediction for downstream tasks like 3D reconstruction.

Pi-HOC introduces a single-pass framework for dense 3D semantic contact prediction of all human-object pairs in multi-human scenarios, achieving significant accuracy improvements and 20x higher throughput over state-of-the-art methods on MMHOI and DAMON datasets.

Resolving real-world human-object interactions in images is a many-to-many challenge, in which disentangling fine-grained concurrent physical contact is particularly difficult. Existing semantic contact estimation methods are either limited to single-human settings or require object geometries (e.g., meshes) in addition to the input image. Current state-of-the-art leverages powerful VLM for category-level semantics but struggles with multi-human scenarios and scales poorly in inference. We introduce Pi-HOC, a single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Pi-HOC detects instances, creates dedicated human-object (HO) tokens for each pair, and refines them using an InteractionFormer. A SAM-based decoder then predicts dense contact on SMPL human meshes for each human-object pair. On the MMHOI and DAMON datasets, Pi-HOC significantly improves accuracy and localization over state-of-the-art methods while achieving 20x higher throughput. We further demonstrate that predicted contacts improve SAM-3D image-to-mesh reconstruction via a test-time optimization algorithm and enable referential contact prediction from language queries without additional training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes