CVAug 11, 2025

LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

arXiv:2508.07603v12 citationsh-index: 8Has CodeMM
Originality Highly original
AI Analysis

This addresses the challenge of maintaining consistent facial identity in personalized video generation, which is incremental as it builds on existing diffusion transformers.

The paper tackles the problem of identity loss in text-to-video generation by proposing LaVieID, a local autoregressive diffusion framework that enhances identity preservation through spatial and temporal refinements, achieving state-of-the-art performance.

In this paper, we present LaVieID, a novel \underline{l}ocal \underline{a}utoregressive \underline{vi}d\underline{e}o diffusion framework designed to tackle the challenging \underline{id}entity-preserving text-to-video task. The key idea of LaVieID is to mitigate the loss of identity information inherent in the stochastic global generation process of diffusion transformers (DiTs) from both spatial and temporal perspectives. Specifically, unlike the global and unstructured modeling of facial latent states in existing DiTs, LaVieID introduces a local router to explicitly represent latent states by weighted combinations of fine-grained local facial structures. This alleviates undesirable feature interference and encourages DiTs to capture distinctive facial characteristics. Furthermore, a temporal autoregressive module is integrated into LaVieID to refine denoised latent tokens before video decoding. This module divides latent tokens temporally into chunks, exploiting their long-range temporal dependencies to predict biases for rectifying tokens, thereby significantly enhancing inter-frame identity consistency. Consequently, LaVieID can generate high-fidelity personalized videos and achieve state-of-the-art performance. Our code and models are available at https://github.com/ssugarwh/LaVieID.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes