CVMar 10, 2024

On depth prediction for autonomous driving using self-supervised learning

arXiv:2403.06194v12.0h-index: 2

Originality Incremental advance

AI Analysis

This work addresses the critical problem of accurate depth perception for autonomous vehicles, though it appears incremental with specific technical contributions rather than a paradigm shift.

The paper tackles depth prediction for autonomous driving using self-supervised learning, proposing novel methods like acontrario cGANs and transformer-based approaches for dynamic objects and future depth forecasting, achieving improved accuracy in depth estimation.

Perception of the environment is a critical component for enabling autonomous driving. It provides the vehicle with the ability to comprehend its surroundings and make informed decisions. Depth prediction plays a pivotal role in this process, as it helps the understanding of the geometry and motion of the environment. This thesis focuses on the challenge of depth prediction using monocular self-supervised learning techniques. The problem is approached from a broader perspective first, exploring conditional generative adversarial networks (cGANs) as a potential technique to achieve better generalization was performed. In doing so, a fundamental contribution to the conditional GANs, the acontrario cGAN was proposed. The second contribution entails a single image-to-depth self-supervised method, proposing a solution for the rigid-scene assumption using a novel transformer-based method that outputs a pose for each dynamic object. The third significant aspect involves the introduction of a video-to-depth map forecasting approach. This method serves as an extension of self-supervised techniques to predict future depths. This involves the creation of a novel transformer model capable of predicting the future depth of a given scene. Moreover, the various limitations of the aforementioned methods were addressed and a video-to-video depth maps model was proposed. This model leverages the spatio-temporal consistency of the input and output sequence to predict a more accurate depth sequence output. These methods have significant applications in autonomous driving (AD) and advanced driver assistance systems (ADAS).

View on arXiv PDF

Similar