CVFeb 17, 2023

Self-Supervised Representation Learning from Temporal Ordering of Automated Driving Sequences

arXiv:2302.09043v315 citationsh-index: 34
AI Analysis

This work addresses the need for better self-supervised learning methods in automated driving perception, offering incremental gains over existing approaches.

The paper tackled the problem of learning dense representations from sequential driving data for perception tasks by proposing TempO, a temporal ordering pretext task, and achieved improvements of +0.7% in mAP for object detection and +2.0% in HOTA score for multi-object tracking.

Self-supervised feature learning enables perception systems to benefit from the vast raw data recorded by vehicle fleets worldwide. While video-level self-supervised learning approaches have shown strong generalizability on classification tasks, the potential to learn dense representations from sequential data has been relatively unexplored. In this work, we propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks. We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems, and formulate the sequential ordering by predicting frame transition probabilities in a transformer-based multi-frame architecture whose complexity scales less than quadratic with respect to the sequence length. Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods as well as supervised transfer learning initialization strategies, achieving an improvement of +0.7% in mAP for object detection and +2.0% in the HOTA score for multi-object tracking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes