CVAIDec 8, 2025

ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation

arXiv:2512.07328v11 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of visual coherence in character generation for video creation applications, representing an incremental improvement over prior personalization methods.

The paper tackles the problem of maintaining consistent character identities in text-to-video generation by proposing ContextAnyone, a context-aware diffusion framework that uses a single reference image to generate videos with improved identity consistency and visual quality, outperforming existing methods.

Text-to-video (T2V) generation has advanced rapidly, yet maintaining consistent character identities across scenes remains a major challenge. Existing personalization methods often focus on facial identity but fail to preserve broader contextual cues such as hairstyle, outfit, and body shape, which are critical for visual coherence. We propose \textbf{ContextAnyone}, a context-aware diffusion framework that achieves character-consistent video generation from text and a single reference image. Our method jointly reconstructs the reference image and generates new video frames, enabling the model to fully perceive and utilize reference information. Reference information is effectively integrated into a DiT-based diffusion backbone through a novel Emphasize-Attention module that selectively reinforces reference-aware features and prevents identity drift across frames. A dual-guidance loss combines diffusion and reference reconstruction objectives to enhance appearance fidelity, while the proposed Gap-RoPE positional embedding separates reference and video tokens to stabilize temporal modeling. Experiments demonstrate that ContextAnyone outperforms existing reference-to-video methods in identity consistency and visual quality, generating coherent and context-preserving character videos across diverse motions and scenes. Project page: \href{https://github.com/ziyang1106/ContextAnyone}{https://github.com/ziyang1106/ContextAnyone}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes