CVMar 5

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

arXiv:2603.05506v11.5h-index: 18

Originality Incremental advance

AI Analysis

This work solves the problem of generating high-quality portrait videos with controlled camera trajectories for content creators and video editors, offering an incremental improvement over existing video generation models.

This paper introduces FaceCam, a system that generates video with customizable camera trajectories from monocular human portrait video input. It addresses geometric distortions and visual artifacts common in existing methods by proposing a face-tailored scale-aware representation for camera transformations, achieving superior performance in camera controllability, visual quality, and identity/motion preservation.

We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control approaches based on large video-generation models have shown promising progress but often exhibit geometric distortions and visual artifacts on portrait videos due to scale-ambiguous camera representations or 3D reconstruction errors. To overcome these limitations, we propose a face-tailored scale-aware representation for camera transformations that provides deterministic conditioning without relying on 3D priors. We train a video generation model on both multi-view studio captures and in-the-wild monocular videos, and introduce two camera-control data generation strategies: synthetic camera motion and multi-shot stitching, to exploit stationary training cameras while generalizing to dynamic, continuous camera trajectories at inference time. Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.

View on arXiv PDF

Similar