Cinematic Behavior Transfer via NeRF-based Differentiable Filming
This work addresses the limitations of existing SLAM methods in dynamic scenes and 2D-focused human pose estimation for video production and digital media applications, representing an incremental advancement.
The paper tackles the problem of precisely manipulating and reproducing visual elements like camera movements and character actions in digital media by introducing a reverse filming behavior estimation technique that optimizes camera trajectories using NeRF as a differentiable renderer and refines SMPL tracks, resulting in a cinematic transfer pipeline that achieves higher user study ratings.
In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired. Existing SLAM methods face limitations in dynamic scenes and human pose estimation often focuses on 2D projections, neglecting 3D statuses. To address these issues, we first introduce a reverse filming behavior estimation technique. It optimizes camera trajectories by leveraging NeRF as a differentiable renderer and refining SMPL tracks. We then introduce a cinematic transfer pipeline that is able to transfer various shot types to a new 2D video or a 3D virtual environment. The incorporation of 3D engine workflow enables superior rendering and control abilities, which also achieves a higher rating in the user study.