CVOct 14, 2022

Instance Segmentation with Cross-Modal Consistency

arXiv:2210.08113v11 citationsh-index: 20
Originality Incremental advance
AI Analysis

This addresses instance segmentation for safety-critical applications in robotics and autonomous driving, representing an incremental improvement through cross-modal consistency.

The paper tackles instance segmentation by jointly leveraging multiple sensor modalities like cameras and LiDAR, using contrastive learning to learn embeddings that are invariant to viewpoint and consistent across modalities, resulting in stable instance masks evaluated on Cityscapes and KITTI-360 datasets.

Segmenting object instances is a key task in machine perception, with safety-critical applications in robotics and autonomous driving. We introduce a novel approach to instance segmentation that jointly leverages measurements from multiple sensor modalities, such as cameras and LiDAR. Our method learns to predict embeddings for each pixel or point that give rise to a dense segmentation of the scene. Specifically, our technique applies contrastive learning to points in the scene both across sensor modalities and the temporal domain. We demonstrate that this formulation encourages the models to learn embeddings that are invariant to viewpoint variations and consistent across sensor modalities. We further demonstrate that the embeddings are stable over time as objects move around the scene. This not only provides stable instance masks, but can also provide valuable signals to downstream tasks, such as object tracking. We evaluate our method on the Cityscapes and KITTI-360 datasets. We further conduct a number of ablation studies, demonstrating benefits when applying additional inputs for the contrastive loss.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes