SDJun 24, 2021Code
SofaMyRoom: a fast and multiplatform "shoebox" room simulator for binaural room impulse response dataset generationRoberto Barumerli, Daniele Bianchi, Michele Geronazzo et al.
This paper introduces a shoebox room simulator able to systematically generate synthetic datasets of binaural room impulse responses (BRIRs) given an arbitrary set of head-related transfer functions (HRTFs). The evaluation of machine hearing algorithms frequently requires BRIR datasets in order to simulate the acoustics of any environment. However, currently available solutions typically consider only HRTFs measured on dummy heads, which poorly characterize the high variability in spatial sound perception. Our solution allows to integrate a room impulse response (RIR) simulator with different HRTF sets represented in Spatially Oriented Format for Acoustics (SOFA). The source code and the compiled binaries for different operating systems allow to both advanced and non-expert users to benefit from our toolbox, see https://github.com/spatialaudiotools/sofamyroom/ .
MMMar 4, 2020Code
ASMD: an automatic framework for compiling multimodal datasets with audio and scoresFederico Simonetta, Stavros Ntalampiras, Federico Avanzini
This paper describes an open-source Python framework for handling datasets for music processing tasks, built with the aim of improving the reproducibility of research projects in music computing and assessing the generalization abilities of machine learning models. The framework enables the automatic download and installation of several commonly used datasets for multimodal music processing. Specifically, we provide a Python API to access the datasets through Boolean set operations based on particular attributes, such as intersections and unions of composers, instruments, and so on. The framework is designed to ease the inclusion of new datasets and the respective ground-truth annotations so that one can build, convert, and extend one's own collection as well as distribute it by means of a compliant format to take advantage of the API. All code and ground-truth are released under suitable open licenses.
SDMar 30, 2022
Acoustics-specific Piano Velocity EstimationFederico Simonetta, Stavros Ntalampiras, Federico Avanzini
Motivated by the state-of-art psychological research, we note that a piano performance transcribed with existing Automatic Music Transcription (AMT) methods cannot be successfully resynthesized without affecting the artistic content of the performance. This is due to 1) the different mappings between MIDI parameters used by different instruments, and 2) the fact that musicians adapt their way of playing to the surrounding acoustic environment. To face this issue, we propose a methodology to build acoustics-specific AMT systems that are able to model the adaptations that musicians apply to convey their interpretation. Specifically, we train models tailored for virtual instruments in a modular architecture that takes as input an audio recording and the relative aligned music score, and outputs the acoustics-specific velocities of each note. We test different model shapes and show that the proposed methodology generally outperforms the usual AMT pipeline which does not consider specificities of the instrument and of the acoustic environment. Interestingly, such a methodology is extensible in a straightforward way since only slight efforts are required to train models for the inference of other piano parameters, such as pedaling.
SDFeb 24, 2022
A Perceptual Measure for Evaluating the Resynthesis of Automatic Music TranscriptionsFederico Simonetta, Federico Avanzini, Stavros Ntalampiras
This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change. We propose to distinguish the concept of "performance" from the one of "interpretation", which expresses the "artistic intention". Towards assessing this distinction, we carried out an experimental evaluation where 91 subjects were invited to listen to various audio recordings created by resynthesizing MIDI data obtained through Automatic Music Transcription (AMT) systems and a sensorized acoustic piano. During the resynthesis, we simulated different contexts and asked listeners to evaluate how much the interpretation changes when the context changes. Results show that: (1) MIDI format alone is not able to completely grasp the artistic intention of a music performance; (2) usual objective evaluation measures based on MIDI data present low correlations with the average subjective evaluation. To bridge this gap, we propose a novel measure which is meaningfully correlated with the outcome of the tests. In addition, we investigate multimodal machine learning by providing a new score-informed AMT method and propose an approximation algorithm for the $p$-dispersion problem.
SDJul 27, 2021
Audio-to-Score Alignment Using Deep Automatic Music TranscriptionFederico Simonetta, Stavros Ntalampiras, Federico Avanzini
Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.
MMFeb 14, 2019
Multimodal music information processing and retrieval: survey and future challengesFederico Simonetta, Stavros Ntalampiras, Federico Avanzini
Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years.