PromptMono: Cross Prompting Attention for Self-Supervised Monocular Depth Estimation in Challenging Environments
This work addresses the difficulty of depth estimation in diverse conditions for applications like autonomous driving, but it appears incremental as it builds on existing self-supervised methods with a novel module.
The paper tackles the problem of monocular depth estimation in challenging environments by introducing a self-supervised learning framework called PromptMono, which uses visual prompt learning and a gated cross prompting attention module to achieve superior performance on datasets like Oxford Robotcar and nuScenes.
Considerable efforts have been made to improve monocular depth estimation under ideal conditions. However, in challenging environments, monocular depth estimation still faces difficulties. In this paper, we introduce visual prompt learning for predicting depth across different environments within a unified model, and present a self-supervised learning framework called PromptMono. It employs a set of learnable parameters as visual prompts to capture domain-specific knowledge. To integrate prompting information into image representations, a novel gated cross prompting attention (GCPA) module is proposed, which enhances the depth estimation in diverse conditions. We evaluate the proposed PromptMono on the Oxford Robotcar dataset and the nuScenes dataset. Experimental results demonstrate the superior performance of the proposed method.