Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts
This addresses the challenge of scalable motion synthesis in complex environments for applications in robotics, animation, or VR, but it is incremental as it builds on existing motion generation methods with a novel unified approach.
The paper tackled the problem of generating 3D human motion across diverse interaction contexts like human-human, human-object, and human-scene, and introduced Uni-Inter, a unified framework that achieved competitive performance and generalization to novel entity combinations.
We present Uni-Inter, a unified framework for human motion generation that supports a wide range of interaction scenarios: including human-human, human-object, and human-scene-within a single, task-agnostic architecture. In contrast to existing methods that rely on task-specific designs and exhibit limited generalization, Uni-Inter introduces the Unified Interactive Volume (UIV), a volumetric representation that encodes heterogeneous interactive entities into a shared spatial field. This enables consistent relational reasoning and compound interaction modeling. Motion generation is formulated as joint-wise probabilistic prediction over the UIV, allowing the model to capture fine-grained spatial dependencies and produce coherent, context-aware behaviors. Experiments across three representative interaction tasks demonstrate that Uni-Inter achieves competitive performance and generalizes well to novel combinations of entities. These results suggest that unified modeling of compound interactions offers a promising direction for scalable motion synthesis in complex environments.