CVMay 7, 2020

Self-Supervised Human Depth Estimation from Monocular Videos

Feitong Tan, Hao Zhu, Zhaopeng Cui, Siyu Zhu, Marc Pollefeys, Ping Tan

arXiv:2005.03358v113.233 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of collecting supervised depth data for human depth estimation, making training simpler and more generalizable for applications in computer vision.

The paper tackles the problem of estimating detailed human depth from monocular videos without requiring ground truth depth data, achieving better generalization and performance on in-the-wild data through a self-supervised method that minimizes photo-consistency loss using estimated non-rigid body motion.

Previous methods on estimating detailed human depth often require supervised training with `ground truth' depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.

View on arXiv PDF

Similar