SDAILGASFeb 28, 2025

Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring

arXiv:2502.20838v21 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This addresses scalable marine monitoring for researchers by reducing annotation needs, though it is incremental as it builds on existing MIL methods.

The paper tackles whale call detection and localization in long-duration audio using only bag-level labels, achieving F1 scores of 0.8-0.9 for classification and localization precision of 0.65-0.70.

Marine ecosystem monitoring via Passive Acoustic Monitoring (PAM) generates vast data, but deep learning often requires precise annotations and short segments. We introduce DSMIL-LocNet, a Multiple Instance Learning framework for whale call detection and localization using only bag-level labels. Our dual-stream model processes 2-30 minute audio segments, leveraging spectral and temporal features with attention-based instance selection. Tests on Antarctic whale data show longer contexts improve classification (F1: 0.8-0.9) while medium instances ensure localization precision (0.65-0.70). This suggests MIL can enhance scalable marine monitoring. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes