CVMay 18

CineMatte: Background Matting for Virtual Production and Beyond

Yuanjian He, Chen Zhang, Fasheng Chen, Jiangbo Cao

arXiv:2605.1832829.8

AI Analysis

For visual effects professionals in virtual production, CineMatte enables robust post-shot background replacement, addressing a key bottleneck in LED VP workflows.

CineMatte introduces a background matting framework for LED Virtual Production, using a cross-attention-conditioned design with a frozen DINOv3 ViT and a pretrained feature upsampler to improve robustness and reduce boundary artifacts. It achieves state-of-the-art results on the new CineMatte-4K dataset and public benchmarks like VideoMatte240K and YouTubeMatte.

LED Virtual Production (VP) uses large LED volumes to render backgrounds in real time, enabling in-camera visual effects but making post-shot changes labor-intensive. We address this with CineMatte, a robust background matting framework for VP and beyond. CineMatte employs a cross-attention-conditioned design. Instead of concatenating the background with the input, CineMatte employs a Siamese, frozen DINOv3 Vision Transformer with shared weights to encode the input frame and the captured background separately. A cross-attention module compares the two streams to predict the foreground, preserving pretrained semantics and improving robustness to background shifts. Previous ViT-based matting models use a parallel convolutional "detail branch" to recover fine details, which can cause boundary artifacts in real-world samples due to semantic misalignment with the backbone. We instead replace it with a pretrained, image-guided feature upsampler, which largely mitigates the problem. We also introduce CineMatte-4K, a 4K HDR image-video dataset captured on a professional LED VP stage. To the best of our knowledge, the image subset is the first dataset for VP matting and is non-synthetic, obtained via green-screen insertion; the video subset includes camera motion with tracked trajectories so that arbitrary backgrounds can be rendered later with correct parallax. Across CineMatte-4K and public benchmarks (VideoMatte240K, YouTubeMatte), CineMatte not only excels in VP but also generalizes robustly to real-world footage.

View on arXiv PDF

Similar