CVAIMar 25, 2025

Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders

arXiv:2503.19947v1h-index: 4
Originality Incremental advance
AI Analysis

This addresses the need for precise depth perception in robotics by extending existing encoders, though it is incremental as it builds on prior self-supervised and encoding methods.

The paper tackled the problem of enabling generalized metric depth understanding in pretrained RGB encoders for vision-guided robotics, achieving state-of-the-art results without finetuning, such as 56.05 mIoU on SUN-RGBD segmentation and 88.3 RMSE on Void's depth completion.

Generalized metric depth understanding is critical for precise vision-guided robotics, which current state-of-the-art (SOTA) vision-encoders do not support. To address this, we propose Vanishing Depth, a self-supervised training approach that extends pretrained RGB encoders to incorporate and align metric depth into their feature embeddings. Based on our novel positional depth encoding, we enable stable depth density and depth distribution invariant feature extraction. We achieve performance improvements and SOTA results across a spectrum of relevant RGBD downstream tasks - without the necessity of finetuning the encoder. Most notably, we achieve 56.05 mIoU on SUN-RGBD segmentation, 88.3 RMSE on Void's depth completion, and 83.8 Top 1 accuracy on NYUv2 scene classification. In 6D-object pose estimation, we outperform our predecessors of DinoV2, EVA-02, and Omnivore and achieve SOTA results for non-finetuned encoders in several related RGBD downstream tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes