AMVNet: Assertion-based Multi-View Fusion Network for LiDAR Semantic Segmentation
This work provides a computationally efficient and modular approach for improving LiDAR semantic segmentation accuracy, which is beneficial for resource-constrained robotic systems like autonomous vehicles.
This paper introduces AMVNet, a late fusion network for LiDAR semantic segmentation that refines predictions by sampling points with score disagreements from multiple projection-based networks. It achieves state-of-the-art results on both SemanticKITTI and nuScenes benchmark datasets, outperforming baseline fusion methods.
In this paper, we present an Assertion-based Multi-View Fusion network (AMVNet) for LiDAR semantic segmentation which aggregates the semantic features of individual projection-based networks using late fusion. Given class scores from different projection-based networks, we perform assertion-guided point sampling on score disagreements and pass a set of point-level features for each sampled point to a simple point head which refines the predictions. This modular-and-hierarchical late fusion approach provides the flexibility of having two independent networks with a minor overhead from a light-weight network. Such approaches are desirable for robotic systems, e.g. autonomous vehicles, for which the computational and memory resources are often limited. Extensive experiments show that AMVNet achieves state-of-the-art results in both the SemanticKITTI and nuScenes benchmark datasets and that our approach outperforms the baseline method of combining the class scores of the projection-based networks.