CVMay 27, 2025

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

arXiv:2505.21491v28 citationsh-index: 6
Originality Incremental advance
AI Analysis

It addresses controllability and temporal coherence in video generation for users in cinematic or creative applications, though it appears incremental as it builds on existing techniques.

The paper tackled controllable image-to-video generation by enabling users to control objects entering or leaving scenes via motion trajectories, resulting in a method that significantly outperformed existing baselines.

Controllability, temporal coherence, and detail synthesis remain the most critical challenges in video generation. In this paper, we focus on a commonly used yet underexplored cinematic technique known as Frame In and Frame Out. Specifically, starting from image-to-video generation, users can control the objects in the image to naturally leave the scene or provide breaking new identity references to enter the scene, guided by a user-specified motion trajectory. To support this task, we introduce a new dataset that is curated semi-automatically, an efficient identity-preserving motion-controllable video Diffusion Transformer architecture, and a comprehensive evaluation protocol targeting this task. Our evaluation shows that our proposed approach significantly outperforms existing baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes