CVMar 5

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

arXiv:2603.05506v1
Originality Incremental advance
AI Analysis

This work solves the problem of generating high-quality portrait videos with controlled camera trajectories for content creators and video editors, offering an incremental improvement over existing video generation models.

This paper introduces FaceCam, a system that generates video with customizable camera trajectories from monocular human portrait video input. It addresses geometric distortions and visual artifacts common in existing methods by proposing a face-tailored scale-aware representation for camera transformations, achieving superior performance in camera controllability, visual quality, and identity/motion preservation.

We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control approaches based on large video-generation models have shown promising progress but often exhibit geometric distortions and visual artifacts on portrait videos due to scale-ambiguous camera representations or 3D reconstruction errors. To overcome these limitations, we propose a face-tailored scale-aware representation for camera transformations that provides deterministic conditioning without relying on 3D priors. We train a video generation model on both multi-view studio captures and in-the-wild monocular videos, and introduce two camera-control data generation strategies: synthetic camera motion and multi-shot stitching, to exploit stationary training cameras while generalizing to dynamic, continuous camera trajectories at inference time. Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes