CVOct 28, 2025

SAGE: Structure-Aware Generative Video Transitions between Diverse Clips

arXiv:2510.24667v12 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of creating smooth, visually coherent video transitions for professional video editing, though it is incremental as it builds on existing generative inbetweening methods.

The paper tackles the problem of synthesizing intermediate frames between two diverse video clips with large temporal or semantic gaps, proposing SAGE, a zero-shot method that outperforms existing baselines in quantitative metrics and user studies.

Video transitions aim to synthesize intermediate frames between two clips, but naive approaches such as linear blending introduce artifacts that limit professional use or break temporal coherence. Traditional techniques (cross-fades, morphing, frame interpolation) and recent generative inbetweening methods can produce high-quality plausible intermediates, but they struggle with bridging diverse clips involving large temporal gaps or significant semantic differences, leaving a gap for content-aware and visually coherent transitions. We address this challenge by drawing on artistic workflows, distilling strategies such as aligning silhouettes and interpolating salient features to preserve structure and perceptual continuity. Building on this, we propose SAGE (Structure-Aware Generative vidEo transitions) as a zeroshot approach that combines structural guidance, provided via line maps and motion flow, with generative synthesis, enabling smooth, semantically consistent transitions without fine-tuning. Extensive experiments and comparison with current alternatives, namely [FILM, TVG, DiffMorpher, VACE, GI], demonstrate that SAGE outperforms both classical and generative baselines on quantitative metrics and user studies for producing transitions between diverse clips. Code to be released on acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes