Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis
This addresses the challenge of domain-specific bioacoustic analysis for marine biologists, but it is incremental as it adapts existing models rather than introducing new paradigms.
The researchers tackled the problem of analyzing marine mammal vocalizations from bioacoustic spectrograms by investigating whether Vision Language Models (VLMs) can extract meaningful patterns visually, resulting in a framework that integrates VLM interpretation with LLM-based validation to adapt to acoustic data without manual annotation or retraining.
Marine mammal vocalization analysis depends on interpreting bioacoustic spectrograms. Vision Language Models (VLMs) are not trained on these domain-specific visualizations. We investigate whether VLMs can extract meaningful patterns from spectrograms visually. Our framework integrates VLM interpretation with LLM-based validation to build domain knowledge. This enables adaptation to acoustic data without manual annotation or model retraining.