BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point Clouds
This addresses the problem of efficient self-supervised learning for automotive Lidar data, offering a trade-off between simplicity and performance, though it is incremental relative to existing methods.
The paper tackles self-supervision for 3D backbones on automotive Lidar point clouds by proposing BEVContrast, a method using contrastive loss at the level of 2D cells in Bird's Eye View, which surpasses state-of-the-art TARL in downstream semantic segmentation while retaining simplicity.
We present a surprisingly simple and efficient method for self-supervision of 3D backbone on automotive Lidar point clouds. We design a contrastive loss between features of Lidar scans captured in the same scene. Several such approaches have been proposed in the literature from PointConstrast, which uses a contrast at the level of points, to the state-of-the-art TARL, which uses a contrast at the level of segments, roughly corresponding to objects. While the former enjoys a great simplicity of implementation, it is surpassed by the latter, which however requires a costly pre-processing. In BEVContrast, we define our contrast at the level of 2D cells in the Bird's Eye View plane. Resulting cell-level representations offer a good trade-off between the point-level representations exploited in PointContrast and segment-level representations exploited in TARL: we retain the simplicity of PointContrast (cell representations are cheap to compute) while surpassing the performance of TARL in downstream semantic segmentation.