GR CV LGJul 23, 2025

Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA

Rameen Abdal, Or Patashnik, Ekaterina Deyneka, Hao Chen, Aliaksandr Siarohin, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman

arXiv:2507.17963v15.95 citationsh-index: 30SIGGRAPH Asia

Originality Incremental advance

AI Analysis

This addresses scalability issues in video personalization for AI content creation, though it is incremental as it builds on existing LoRA and grid-based methods.

The paper tackles the problem of personalizing dynamic concepts in text-to-video generation without per-instance fine-tuning, introducing a zero-shot framework that uses Grid-LoRA adapters and a Grid Fill module to achieve high-quality, temporally coherent outputs for unseen subjects and editing scenarios.

Recent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now feasible, most existing methods require per-instance fine-tuning, limiting scalability. We introduce a fully zero-shot framework for dynamic concept personalization in text-to-video models. Our method leverages structured 2x2 video grids that spatially organize input and output pairs, enabling the training of lightweight Grid-LoRA adapters for editing and composition within these grids. At inference, a dedicated Grid Fill module completes partially observed layouts, producing temporally coherent and identity preserving outputs. Once trained, the entire system operates in a single forward pass, generalizing to previously unseen dynamic concepts without any test-time optimization. Extensive experiments demonstrate high-quality and consistent results across a wide range of subjects beyond trained concepts and editing scenarios.

View on arXiv PDF

Similar