CVAIOct 14, 2025

SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion

arXiv:2510.13044v1h-index: 29
Originality Incremental advance
AI Analysis

This work addresses a specific problem in human motion generation for computer graphics or robotics, but it is incremental as it builds on existing text-to-motion models by adding scene-awareness through adaptation.

The paper tackles the problem of generating human motion that is both text-conditioned and scene-aware, which is challenging due to the lack of large-scale datasets combining both aspects. It introduces SceneAdapt, a framework that uses motion inbetweening as a proxy task to bridge disjoint datasets, resulting in effective injection of scene awareness into text-to-motion models.

Human motion is inherently diverse and semantically rich, while also shaped by the surrounding scene. However, existing motion generation approaches address either motion semantics or scene-awareness in isolation, since constructing large-scale datasets with both rich text--motion coverage and precise scene interactions is extremely challenging. In this work, we introduce SceneAdapt, a framework that injects scene awareness into text-conditioned motion models by leveraging disjoint scene--motion and text--motion datasets through two adaptation stages: inbetweening and scene-aware inbetweening. The key idea is to use motion inbetweening, learnable without text, as a proxy task to bridge two distinct datasets and thereby inject scene-awareness to text-to-motion models. In the first stage, we introduce keyframing layers that modulate motion latents for inbetweening while preserving the latent manifold. In the second stage, we add a scene-conditioning layer that injects scene geometry by adaptively querying local context through cross-attention. Experimental results show that SceneAdapt effectively injects scene awareness into text-to-motion models, and we further analyze the mechanisms through which this awareness emerges. Code and models will be released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes