ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds
This addresses motion estimation for autonomous driving systems, presenting an incremental improvement with novel components like Soft Discriminative Loss and Gated Multi-frame Fusion.
The paper tackles the problem of self-supervised motion estimation for LiDAR point clouds in autonomous driving by predicting scene motion via feature-level consistency, achieving effectiveness and superiority in scene flow and motion prediction tasks.
In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose \textit{Soft Discriminative Loss} that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose \textit{Gated Multi-frame Fusion} block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, \textit{pillar association} is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our \textbf{ContrastMotion} on both scene flow and motion prediction tasks. The code is available soon.