CVMar 20, 2025

SceneMI: Motion In-betweening for Modeling Human-Scene Interactions

arXiv:2503.16289v210 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses controllability and flexibility issues in human-scene interaction modeling for applications such as character animation and motion enhancement, though it is incremental as it builds on existing generative and diffusion methods.

The paper tackles the problem of modeling human-scene interactions by reformulating it as scene-aware motion in-betweening, resulting in a framework that effectively handles noisy keyframes and generalizes to real-world datasets like GIMO.

Modeling human-scene interactions (HSI) is essential for understanding and simulating everyday human behaviors. Recent approaches utilizing generative modeling have made progress in this domain; however, they are limited in controllability and flexibility for real-world applications. To address these challenges, we propose reformulating the HSI modeling problem as Scene-aware Motion In-betweening - a more tractable and practical task. We introduce SceneMI, a framework that supports several practical applications, including keyframe-guided character animation in 3D scenes and enhancing the motion quality of imperfect HSI data. SceneMI employs dual scene descriptors to comprehensively encode global and local scene context. Furthermore, our framework leverages the inherent denoising nature of diffusion models to generalize on noisy keyframes. Experimental results demonstrate SceneMI's effectiveness in scene-aware keyframe in-betweening and generalization to the real-world GIMO dataset, where motions and scenes are acquired by noisy IMU sensors and smartphones. We further showcase SceneMI's applicability in HSI reconstruction from monocular videos.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes