CVMar 10, 2023

You Only Train Once: Multi-Identity Free-Viewpoint Neural Human Rendering from Monocular Videos

arXiv:2303.05835v16 citationsh-index: 42
Originality Incremental advance
AI Analysis

This addresses scalability issues in virtual reality and other applications by enabling one-time training for multiple identities, though it is incremental over prior NeRF-based methods.

The paper tackles the problem of requiring individualized optimization for each human identity in free-viewpoint neural rendering from monocular videos, resulting in YOTO, a framework that achieves multi-identity rendering with state-of-the-art performance on metrics like ZJU-MoCap and PeopleSnapshot, while improving training and inference efficiency.

We introduce You Only Train Once (YOTO), a dynamic human generation framework, which performs free-viewpoint rendering of different human identities with distinct motions, via only one-time training from monocular videos. Most prior works for the task require individualized optimization for each input video that contains a distinct human identity, leading to a significant amount of time and resources for the deployment, thereby impeding the scalability and the overall application potential of the system. In this paper, we tackle this problem by proposing a set of learnable identity codes to expand the capability of the framework for multi-identity free-viewpoint rendering, and an effective pose-conditioned code query mechanism to finely model the pose-dependent non-rigid motions. YOTO optimizes neural radiance fields (NeRF) by utilizing designed identity codes to condition the model for learning various canonical T-pose appearances in a single shared volumetric representation. Besides, our joint learning of multiple identities within a unified model incidentally enables flexible motion transfer in high-quality photo-realistic renderings for all learned appearances. This capability expands its potential use in important applications, including Virtual Reality. We present extensive experimental results on ZJU-MoCap and PeopleSnapshot to clearly demonstrate the effectiveness of our proposed model. YOTO shows state-of-the-art performance on all evaluation metrics while showing significant benefits in training and inference efficiency as well as rendering quality. The code and model will be made publicly available soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes