Roee Diamant

SD
h-index78
7papers
73citations
Novelty40%
AI Score42

7 Papers

LGDec 1, 2025Code
WhAM: Towards A Translative Model of Sperm Whale Vocalization

Orr Paradise, Pranav Muralikrishnan, Liangyuan Chen et al.

Sperm whales communicate in short sequences of clicks known as codas. We present WhAM (Whale Acoustics Model), the first transformer-based model capable of generating synthetic sperm whale codas from any audio prompt. WhAM is built by finetuning VampNet, a masked acoustic token model pretrained on musical audio, using 10k coda recordings collected over the past two decades. Through iterative masked token prediction, WhAM generates high-fidelity synthetic codas that preserve key acoustic features of the source recordings. We evaluate WhAM's synthetic codas using Fréchet Audio Distance and through perceptual studies with expert marine biologists. On downstream classification tasks including rhythm, social unit, and vowel classification, WhAM's learned representations achieve strong performance, despite being trained for generation rather than classification. Our code is available at https://github.com/Project-CETI/wham

SDNov 28, 2022
Automated Detection of Dolphin Whistles with Convolutional Networks and Transfer Learning

Burla Nur Korkmaz, Roee Diamant, Gil Danino et al.

Effective conservation of maritime environments and wildlife management of endangered species require the implementation of efficient, accurate and scalable solutions for environmental monitoring. Ecoacoustics offers the advantages of non-invasive, long-duration sampling of environmental sounds and has the potential to become the reference tool for biodiversity surveying. However, the analysis and interpretation of acoustic data is a time-consuming process that often requires a great amount of human supervision. This issue might be tackled by exploiting modern techniques for automatic audio signal analysis, which have recently achieved impressive performance thanks to the advances in deep learning research. In this paper we show that convolutional neural networks can indeed significantly outperform traditional automatic methods in a challenging detection task: identification of dolphin whistles from underwater audio recordings. The proposed system can detect signals even in the presence of ambient noise, at the same time consistently reducing the likelihood of producing false positives and false negatives. Our results further support the adoption of artificial intelligence technology to improve the automatic monitoring of marine ecosystems.

LGJan 10, 2023
An Efficient Drifters Deployment Strategy to Evaluate Water Current Velocity Fields

Murad Tukan, Eli Biton, Roee Diamant

Water current prediction is essential for understanding ecosystems, and to shed light on the role of the ocean in the global climate context. Solutions vary from physical modeling, and long-term observations, to short-term measurements. In this paper, we consider a common approach for water current prediction that uses Lagrangian floaters for water current prediction by interpolating the trajectory of the elements to reflect the velocity field. Here, an important aspect that has not been addressed before is where to initially deploy the drifting elements such that the acquired velocity field would efficiently represent the water current. To that end, we use a clustering approach that relies on a physical model of the velocity field. Our method segments the modeled map and determines the deployment locations as those that will lead the floaters to 'visit' the center of the different segments. This way, we validate that the area covered by the floaters will capture the in-homogeneously in the velocity field. Exploration over a dataset of velocity field maps that span over a year demonstrates the applicability of our approach, and shows a considerable improvement over the common approach of uniformly randomly choosing the initial deployment sites. Finally, our implementation code can be found in [1].

CVApr 24, 2023
Underwater object classification combining SAS and transferred optical-to-SAS Imagery

Avi Abu, Roee Diamant

Combining synthetic aperture sonar (SAS) imagery with optical images for underwater object classification has the potential to overcome challenges such as water clarity, the stability of the optical image analysis platform, and strong reflections from the seabed for sonar-based classification. In this work, we propose this type of multi-modal combination to discriminate between man-made targets and objects such as rocks or litter. We offer a novel classification algorithm that overcomes the problem of intensity and object formation differences between the two modalities. To this end, we develop a novel set of geometrical shape descriptors that takes into account the geometrical relation between the objects shadow and highlight. Results from 7,052 pairs of SAS and optical images collected during several sea experiments show improved classification performance compared to the state-of-the-art for better discrimination between different types of underwater objects. For reproducibility, we share our database.

21.5SDMay 5
ShipEcho -- An Interactive Tool for Global Mapping of Underwater Radiated Noise from Vessels

Mark Shipton, Valentino Denona, Đula Nađ et al.

Underwater radiated noise from vessels (V-URN) is a recognized environmental stressor that negatively impacts marine ecosystems. Significant resources are invested in the development of V-URN monitoring indicators, regulatory frameworks, and management-oriented assessments. One approach with high potential for impact is V-URN mapping, which can provide actionable spatiotemporal information for environmental assessment and mitigation planning. Producing management-scale maps remains challenging as passive acoustic measurements are spatially sparse and many operational systems depend on specialist workflows and costly access to wide-area vessel activity data. To address these constraints, we introduce ShipEcho, a freely accessible web-based Geographic Information System (GIS) that provides near-real-time V-URN mapping using vessel data acquired through a community-based AIS exchange. Using established vessel SL models and propagation modeling informed by bathymetric data, ShipEcho produces near-real-time and cumulative noise maps across regions worldwide. These include sound pressure levels and sound exposure levels using standard indicators, including the 63~Hz and 125~Hz one-third octave bands and a 20--2000~Hz broadband level. We describe the system architecture, data pipeline, modeling workflow, and key assumptions, and evaluate map accuracy through comparison with acoustic recordings. We then demonstrate how ShipEcho can support management-level assessment, decision-making, and policy initiatives through practical use cases.

ASDec 31, 2023
Detecting the presence of sperm whales echolocation clicks in noisy environments

Guy Gubnitsky, Roee Diamant

Sperm whales (Physeter macrocephalus) navigate underwater with a series of impulsive, click-like sounds known as echolocation clicks. These clicks are characterized by a multipulse structure (MPS) that serves as a distinctive pattern. In this work, we use the stability of the MPS as a detection metric for recognizing and classifying the presence of clicks in noisy environments. To distinguish between noise transients and to handle simultaneous emissions from multiple sperm whales, our approach clusters a time series of MPS measures while removing potential clicks that do not fulfil the limits of inter-click interval, duration and spectrum. As a result, our approach can handle high noise transients and low signal-to-noise ratio. The performance of our detection approach is examined using three datasets: seven months of recordings from the Mediterranean Sea containing manually verified ambient noise; several days of manually labelled data collected from the Dominica Island containing approximately 40,000 clicks from multiple sperm whales; and a dataset from the Bahamas containing 1,203 labelled clicks from a single sperm whale. Comparing with the results of two benchmark detectors, a better trade-off between precision and recall is observed as well as a significant reduction in false detection rates, especially in noisy environments. To ensure reproducibility, we provide our database of labelled clicks along with our implementation code.

SDApr 17, 2021
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales

Jacob Andreas, Gašper Beguš, Michael M. Bronstein et al.

The past decade has witnessed a groundbreaking rise of machine learning for human language analysis, with current methods capable of automatically accurately recovering various aspects of syntax and semantics - including sentence structure and grounded word meaning - from large data collections. Recent research showed the promise of such tools for analyzing acoustic communication in nonhuman species. We posit that machine learning will be the cornerstone of future collection, processing, and analysis of multimodal streams of data in animal communication studies, including bioacoustic, behavioral, biological, and environmental data. Cetaceans are unique non-human model species as they possess sophisticated acoustic communications, but utilize a very different encoding system that evolved in an aquatic rather than terrestrial medium. Sperm whales, in particular, with their highly-developed neuroanatomical features, cognitive abilities, social structures, and discrete click-based encoding make for an excellent starting point for advanced machine learning tools that can be applied to other animals in the future. This paper details a roadmap toward this goal based on currently existing technology and multidisciplinary scientific community effort. We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales, detecting their basic communication units and language-like higher-level structures, and validating these models through interactive playback experiments. The technological capabilities developed by such an undertaking are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.