CVAISep 25, 2024

Underwater Camouflaged Object Tracking Meets Vision-Language SAM2

arXiv:2409.16902v59 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of tracking camouflaged marine animals for underwater robotics or marine biology, but it is incremental as it builds on existing SAM2 technology.

The authors tackled the lack of datasets for underwater camouflaged object tracking by introducing UW-COT220, a large-scale multi-modal dataset, and proposed VL-SAM2, a vision-language framework based on SAM2, which achieved state-of-the-art performance on underwater and open-air tracking datasets.

Over the past decade, significant progress has been made in visual object tracking, largely due to the availability of large-scale datasets. However, these datasets have primarily focused on open-air scenarios and have largely overlooked underwater animal tracking-especially the complex challenges posed by camouflaged marine animals. To bridge this gap, we take a step forward by proposing the first large-scale multi-modal underwater camouflaged object tracking dataset, namely UW-COT220. Based on the proposed dataset, this work first comprehensively evaluates current advanced visual object tracking methods, including SAM- and SAM2-based trackers, in challenging underwater environments, \eg, coral reefs. Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects. Furthermore, we propose a novel vision-language tracking framework called VL-SAM2, based on the video foundation model SAM2. Extensive experimental results demonstrate that the proposed VL-SAM2 achieves state-of-the-art performance across underwater and open-air object tracking datasets. The dataset and codes are available at~{\color{magenta}{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}}.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes