CVMMROMar 19, 2024

WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar

arXiv:2403.12686v322 citationsIEEE transactions on intelligent transportation systems (Print)
Originality Incremental advance
AI Analysis

This work addresses autonomous navigation for USVs, but it is incremental as it adapts visual grounding to a new domain with sensor fusion.

The authors tackled the problem of waterway perception for Unmanned Surface Vehicles by introducing WaterVG, the first visual grounding dataset with 11,568 samples and 34,987 referred targets, and proposed Potamoi, a low-power model that achieves state-of-the-art performance on this dataset.

The perception of waterways based on human intent is significant for autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water environments. Inspired by visual grounding, we introduce WaterVG, the first visual grounding dataset designed for USV-based waterway perception based on human prompts. WaterVG encompasses prompts describing multiple targets, with annotations at the instance level including bounding boxes and masks. Notably, WaterVG includes 11,568 samples with 34,987 referred targets, whose prompts integrates both visual and radar characteristics. The pattern of text-guided two sensors equips a finer granularity of text prompts with visual and radar features of referred targets. Moreover, we propose a low-power visual grounding model, Potamoi, which is a multi-task model with a well-designed Phased Heterogeneous Modality Fusion (PHMF) mode, including Adaptive Radar Weighting (ARW) and Multi-Head Slim Cross Attention (MHSCA). Exactly, ARW extracts required radar features to fuse with vision for prompt alignment. MHSCA is an efficient fusion module with a remarkably small parameter count and FLOPs, elegantly fusing scenario context captured by two sensors with linguistic features, which performs expressively on visual grounding tasks. Comprehensive experiments and evaluations have been conducted on WaterVG, where our Potamoi archives state-of-the-art performances compared with counterparts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes