AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models
This addresses communication challenges for underwater divers, offering a more accessible and context-specific alternative to traditional systems, though it appears incremental as it builds on existing mobile and VLM technologies.
The paper tackles the problem of limited underwater communication by developing AquaVLM, a tap-and-send system that uses a mobile vision-language model to automatically generate context-aware messages via smartphones, with evaluations showing effectiveness for diver safety and broader mobile VLM applications.
Underwater activities like scuba diving enable millions annually to explore marine environments for recreation and scientific research. Maintaining situational awareness and effective communication are essential for diver safety. Traditional underwater communication systems are often bulky and expensive, limiting their accessibility to divers of all levels. While recent systems leverage lightweight smartphones and support text messaging, the messages are predefined and thus restrict context-specific communication. In this paper, we present AquaVLM, a tap-and-send underwater communication system that automatically generates context-aware messages and transmits them using ubiquitous smartphones. Our system features a mobile vision-language model (VLM) fine-tuned on an auto-generated underwater conversation dataset and employs a hierarchical message generation pipeline. We co-design the VLM and transmission, incorporating error-resilient fine-tuning to improve the system's robustness to transmission errors. We develop a VR simulator to enable users to experience AquaVLM in a realistic underwater environment and create a fully functional prototype on the iOS platform for real-world experiments. Both subjective and objective evaluations validate the effectiveness of AquaVLM and highlight its potential for personal underwater communication as well as broader mobile VLM applications.