GRCVLGFeb 20, 2025

Dynamic Concepts Personalization from Single Videos

arXiv:2502.14844v115 citationsh-index: 30SIGGRAPH
Originality Incremental advance
AI Analysis

This work addresses the challenge of dynamic concept personalization in generative video models, which is incremental as it builds on existing text-to-video personalization methods.

The paper tackles the problem of personalizing text-to-video models to capture dynamic concepts, such as entities defined by both appearance and motion, and introduces the Set-and-Sequence framework, which achieves this by fine-tuning LoRA layers in two stages to embed dynamic concepts into the model's output domain, setting a new benchmark for personalization.

Personalizing generative text-to-image models has seen remarkable progress, but extending this personalization to text-to-video models presents unique challenges. Unlike static concepts, personalizing text-to-video models has the potential to capture dynamic concepts, i.e., entities defined not only by their appearance but also by their motion. In this paper, we introduce Set-and-Sequence, a novel framework for personalizing Diffusion Transformers (DiTs)-based generative video models with dynamic concepts. Our approach imposes a spatio-temporal weight space within an architecture that does not explicitly separate spatial and temporal features. This is achieved in two key stages. First, we fine-tune Low-Rank Adaptation (LoRA) layers using an unordered set of frames from the video to learn an identity LoRA basis that represents the appearance, free from temporal interference. In the second stage, with the identity LoRAs frozen, we augment their coefficients with Motion Residuals and fine-tune them on the full video sequence, capturing motion dynamics. Our Set-and-Sequence framework results in a spatio-temporal weight space that effectively embeds dynamic concepts into the video model's output domain, enabling unprecedented editability and compositionality while setting a new benchmark for personalizing dynamic concepts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes