Sparse Depth-Guided Attention for Accurate Depth Completion: A Stereo-Assisted Monitored Distillation Approach
This work addresses depth completion for computer vision applications, offering an incremental advancement by refining existing distillation methods with stereo guidance and self-supervised constraints.
The paper tackles depth completion by proposing a stereo-assisted monitored distillation approach that enhances accuracy through a novel Attention-based Sparse-to-Dense module and multi-view consistency techniques, achieving significant improvements over the baseline method.
This paper proposes a novel method for depth completion, which leverages multi-view improved monitored distillation to generate more precise depth maps. Our approach builds upon the state-of-the-art ensemble distillation method, in which we introduce a stereo-based model as a teacher model to improve the accuracy of the student model for depth completion. By minimizing the reconstruction error of a target image during ensemble distillation, we can avoid learning inherent error modes of completion-based teachers. We introduce an Attention-based Sparse-to-Dense (AS2D) module at the front layer of the student model to enhance its ability to extract global features from sparse depth. To provide self-supervised information, we also employ multi-view depth consistency and multi-scale minimum reprojection. These techniques utilize existing structural constraints to yield supervised signals for student model training, without requiring costly ground truth depth information. Our extensive experimental evaluation demonstrates that our proposed method significantly improves the accuracy of the baseline monitored distillation method.