Climent Nadeu

SD
3papers
5citations
Novelty45%
AI Score20

3 Papers

SDJun 2, 2021
Sound-to-Imagination: An Exploratory Study on Unsupervised Crossmodal Translation Using Diverse Audiovisual Data

Leonardo A. Fanzeres, Climent Nadeu

The motivation of our research is to explore the possibilities of automatic sound-to-image (S2I) translation for enabling a human receiver to visually infer the occurrence of sound related events. We expect the computer to 'imagine' the scene from the captured sound, generating original images that picture the sound emitting source. Previous studies on similar topics opted for simplified approaches using data with low content diversity and/or sound class supervision. Differently, we propose to perform unsupervised S2I translation using thousands of distinct and unknown scenes, with slightly pre-cleaned data, just enough to guarantee aural-visual semantic coherence. To that end, we employ conditional generative adversarial networks (GANs) with a deep densely connected generator. Additionally, we present a solution using informativity classifiers to perform quantitative evaluation of the generated images. This enabled us to analyze the influence of network bottleneck variation over the translation, observing a potential trade-off between informativity and pixel space convergence. Despite the complexity of the specified S2I translation task, we were able to generalize the model enough to obtain more than 14%, in average, of interpretable and semantically coherent images translated from unknown sounds.

SDDec 19, 2017
Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays

Rupayan Chakraborty, Climent Nadeu

In the analysis of acoustic scenes, often the occurring sounds have to be detected in time, recognized, and localized in space. Usually, each of these tasks is done separately. In this paper, a model-based approach to jointly carry them out for the case of multiple simultaneous sources is presented and tested. The recognized event classes and their respective room positions are obtained with a single system that maximizes the combination of a large set of scores, each one resulting from a different acoustic event model and a different beamformer output signal, which comes from one of several arbitrarily-located small microphone arrays. By using a two-step method, the experimental work for a specific scenario consisting of meeting-room acoustic events, either isolated or overlapped with speech, is reported. Tests carried out with two datasets show the advantage of the proposed approach with respect to some usual techniques, and that the inclusion of estimated priors brings a further performance improvement.

ASNov 12, 2017
Automatic detection of alarm sounds in a noisy hospital environment using model and non-model based approaches

Ganna Raboshchuk, Sergi Gómez Quintana, Alex Peiró Lilja et al.

In the noisy acoustic environment of a Neonatal Intensive Care Unit (NICU) there is a variety of alarms, which are frequently triggered by the biomedical equipment. In this paper different approaches for automatic detection of those sound alarms are presented and compared: 1) a non-model-based approach that employs signal processing techniques; 2) a model-based approach based on neural networks; and 3) an approach that combines both non-model and model-based approaches. The performance of the developed detection systems that follow each of those approaches is assessed, analysed and compared both at the frame level and at the event level by using an audio database recorded in a real-world hospital environment.