CVMay 27, 2025

EF-VI: Enhancing End-Frame Injection for Video Inbetweening

arXiv:2505.21205v21 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in video synthesis for applications like animation and video editing, representing an incremental improvement over existing methods.

The paper tackles the problem of weak end-frame constraints in video inbetweening by proposing EF-VI, a framework that enhances end-frame injection for transformer-based models, resulting in superior performance compared to baselines.

Video inbetweening aims to synthesize intermediate video sequences conditioned on the given start and end frames. Current state-of-the-art methods primarily extend large-scale pre-trained Image-to-Video Diffusion Models (I2V-DMs) by incorporating the end-frame condition via direct fine-tuning or temporally bidirectional sampling. However, the former results in a weak end-frame constraint, while the latter inevitably disrupts the input representation of video frames, leading to suboptimal performance. To improve the end-frame constraint while avoiding disruption of the input representation, we propose a novel video inbetweening framework specific to recent and more powerful transformer-based I2V-DMs, termed EF-VI. It efficiently strengthens the end-frame constraint by utilizing an enhanced injection. This is based on our proposed well-designed lightweight module, termed EF-Net, which encodes only the end frame and expands it into temporally adaptive frame-wise features injected into the I2V-DM. Extensive experiments demonstrate the superiority of our EF-VI compared with other baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes