CVLGRONov 10, 2024

Few-shot Semantic Learning for Robust Multi-Biome 3D Semantic Mapping in Off-Road Environments

arXiv:2411.06632v14 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses perception challenges for autonomous navigation in unstructured off-road terrains, but it is incremental as it builds on existing pre-trained models and fusion techniques.

The paper tackles the problem of robust 3D semantic mapping in off-road environments by proposing a method that uses a pre-trained Vision Transformer fine-tuned on a small, coarsely labeled dataset to achieve 2D semantic segmentation, with results showing mIoU scores of 52.9 to 67.2 on benchmark datasets.

Off-road environments pose significant perception challenges for high-speed autonomous navigation due to unstructured terrain, degraded sensing conditions, and domain-shifts among biomes. Learning semantic information across these conditions and biomes can be challenging when a large amount of ground truth data is required. In this work, we propose an approach that leverages a pre-trained Vision Transformer (ViT) with fine-tuning on a small (<500 images), sparse and coarsely labeled (<30% pixels) multi-biome dataset to predict 2D semantic segmentation classes. These classes are fused over time via a novel range-based metric and aggregated into a 3D semantic voxel map. We demonstrate zero-shot out-of-biome 2D semantic segmentation on the Yamaha (52.9 mIoU) and Rellis (55.5 mIoU) datasets along with few-shot coarse sparse labeling with existing data for improved segmentation performance on Yamaha (66.6 mIoU) and Rellis (67.2 mIoU). We further illustrate the feasibility of using a voxel map with a range-based semantic fusion approach to handle common off-road hazards like pop-up hazards, overhangs, and water features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes