CVFeb 17, 2023

Self-Supervised Representation Learning from Temporal Ordering of Automated Driving Sequences

Christopher Lang, Alexander Braun, Lars Schillingmann, Karsten Haug, Abhinav Valada

arXiv:2302.09043v39.115 citationsh-index: 34

Originality Incremental advance

AI Analysis

This work addresses the need for better self-supervised learning methods in automated driving perception, offering incremental gains over existing approaches.

The paper tackled the problem of learning dense representations from sequential driving data for perception tasks by proposing TempO, a temporal ordering pretext task, and achieved improvements of +0.7% in mAP for object detection and +2.0% in HOTA score for multi-object tracking.

Self-supervised feature learning enables perception systems to benefit from the vast raw data recorded by vehicle fleets worldwide. While video-level self-supervised learning approaches have shown strong generalizability on classification tasks, the potential to learn dense representations from sequential data has been relatively unexplored. In this work, we propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks. We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems, and formulate the sequential ordering by predicting frame transition probabilities in a transformer-based multi-frame architecture whose complexity scales less than quadratic with respect to the sequence length. Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods as well as supervised transfer learning initialization strategies, achieving an improvement of +0.7% in mAP for object detection and +2.0% in the HOTA score for multi-object tracking.

View on arXiv PDF

Similar