KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo
This addresses the data scarcity problem in 3D reconstruction for computer vision applications, but appears incremental as it builds on existing self-supervised and knowledge distillation techniques.
The paper tackles the challenge of collecting large-scale ground-truth depth for multi-view stereo by proposing a self-supervised training pipeline using knowledge distillation, where the student model outperforms the teacher and even supervised methods on multiple datasets.
Supervised multi-view stereo (MVS) methods have achieved remarkable progress in terms of reconstruction quality, but suffer from the challenge of collecting large-scale ground-truth depth. In this paper, we propose a novel self-supervised training pipeline for MVS based on knowledge distillation, termed KD-MVS, which mainly consists of self-supervised teacher training and distillation-based student training. Specifically, the teacher model is trained in a self-supervised fashion using both photometric and featuremetric consistency. Then we distill the knowledge of the teacher model to the student model through probabilistic knowledge transferring. With the supervision of validated knowledge, the student model is able to outperform its teacher by a large margin. Extensive experiments performed on multiple datasets show our method can even outperform supervised methods.