Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception
This addresses the need for robust perception in unstructured environments for autonomous systems, but it is incremental as it builds on existing zero-shot approaches by adding temporal consistency.
The paper tackled the problem of achieving temporally consistent unsupervised segmentation for mobile robot perception in unstructured environments, and the result was Frontier-Seg, which demonstrated the ability to perform unsupervised segmentation across diverse benchmark datasets like RUGD and RELLIS-3D.
Rapid progress in terrain-aware autonomous ground navigation has been driven by advances in supervised semantic segmentation. However, these methods rely on costly data collection and labor-intensive ground truth labeling to train deep models. Furthermore, autonomous systems are increasingly deployed in unrehearsed, unstructured environments where no labeled data exists and semantic categories may be ambiguous or domain-specific. Recent zero-shot approaches to unsupervised segmentation have shown promise in such settings but typically operate on individual frames, lacking temporal consistency-a critical property for robust perception in unstructured environments. To address this gap we introduce Frontier-Seg, a method for temporally consistent unsupervised segmentation of terrain from mobile robot video streams. Frontier-Seg clusters superpixel-level features extracted from foundation model backbones-specifically DINOv2-and enforces temporal consistency across frames to identify persistent terrain boundaries or frontiers without human supervision. We evaluate Frontier-Seg on a diverse set of benchmark datasets-including RUGD and RELLIS-3D-demonstrating its ability to perform unsupervised segmentation across unstructured off-road environments.