ROMay 31

A Sonar-Visual Dataset for Cross-Modal Underwater Robot Perception

arXiv:2606.0139810.4
Predicted impact top 86% in RO · last 90 daysOriginality Synthesis-oriented
AI Analysis

This dataset addresses the lack of paired sonar-visual data for underwater robots, enabling cross-modal perception research.

The authors present SOVIS, a sonar-visual dataset with over 76,000 paired frames for cross-modal underwater perception, and demonstrate a cross-modal fish detection task achieving a 7x improvement in mAP@0.10 over a monocular camera baseline.

Underwater robots typically use both cameras and sonar for perception to leverage the rich semantic details of vision and the robust range measurements of acoustics. However, learning to map between these modalities via cross-modal prediction remains underexplored due to limited sonar-visual paired datasets. We present SOVIS, a sonar-visual dataset for cross-modal underwater perception. SOVIS comprises over 76,000 paired frames collected across 17 dives at six sites in the Trondheimfjord, supported by an end-to-end pipeline that cleans and synchronizes the cross-modal sensor data. We also introduce an interactive annotation tool designed to accelerate the labeling process for this paired data. Finally, we demonstrate a proof-of-concept cross-modal fish detection task using a small subset of labeled data, achieving a 7x improvement in mAP@0.10 over a monocular camera baseline. SOVIS serves as the first step toward advancing cross-modal underwater perception research, enabling research directions such as dense sonar prediction from monocular images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes