CVGRFeb 13

FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control

arXiv:2602.13185v1h-index: 5
Originality Incremental advance
AI Analysis

This work addresses the problem of versatile video generation control for AI and multimedia applications, offering a scalable solution that is not incremental but introduces a new paradigm.

The paper tackles the challenge of generalizable control in video generation by proposing FlexAM, a framework that disentangles appearance and motion using a novel 3D control signal represented as a point cloud, achieving superior performance across tasks like I2V/V2V editing, camera control, and spatial object editing.

Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of "appearance" and "motion" provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all evaluated tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes