LiDAR-Anchored Collaborative Distillation for Robust 2D Representations
This addresses robustness issues in vision-based systems for real-world scenarios, but appears incremental as it builds on existing self-supervised learning with LiDAR integration.
The paper tackles the problem of 2D image encoders lacking robustness in noisy and adverse weather conditions by proposing a self-supervised method that uses 3D LiDAR as supervision, resulting in improved performance in downstream tasks and enhanced 3D awareness.
As deep learning continues to advance, self-supervised learning has made considerable strides. It allows 2D image encoders to extract useful features for various downstream tasks, including those related to vision-based systems. Nevertheless, pre-trained 2D image encoders fall short in conducting the task under noisy and adverse weather conditions beyond clear daytime scenes, which require for robust visual perception. To address these issues, we propose a novel self-supervised approach, \textbf{Collaborative Distillation}, which leverages 3D LiDAR as self-supervision to improve robustness to noisy and adverse weather conditions in 2D image encoders while retaining their original capabilities. Our method outperforms competing methods in various downstream tasks across diverse conditions and exhibits strong generalization ability. In addition, our method also improves 3D awareness stemming from LiDAR's characteristics. This advancement highlights our method's practicality and adaptability in real-world scenarios.