CVSep 19, 2025

Lynx: Towards High-Fidelity Personalized Video Generation

arXiv:2509.15496v14 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of high-fidelity identity preservation in personalized video synthesis for applications like entertainment or social media, representing a strong specific gain.

The paper tackles the problem of personalized video generation from a single input image by introducing Lynx, a model that achieves superior face resemblance and strong video quality, as demonstrated on a benchmark of 800 test cases.

We present Lynx, a high-fidelity model for personalized video synthesis from a single input image. Built on an open-source Diffusion Transformer (DiT) foundation model, Lynx introduces two lightweight adapters to ensure identity fidelity. The ID-adapter employs a Perceiver Resampler to convert ArcFace-derived facial embeddings into compact identity tokens for conditioning, while the Ref-adapter integrates dense VAE features from a frozen reference pathway, injecting fine-grained details across all transformer layers through cross-attention. These modules collectively enable robust identity preservation while maintaining temporal coherence and visual realism. Through evaluation on a curated benchmark of 40 subjects and 20 unbiased prompts, which yielded 800 test cases, Lynx has demonstrated superior face resemblance, competitive prompt following, and strong video quality, thereby advancing the state of personalized video generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes