SDIRLGASOct 20, 2023

Music Augmentation and Denoising For Peak-Based Audio Fingerprinting

arXiv:2310.13388v21 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses robustness issues in audio identification for real-world applications, though it is incremental as it builds on existing fingerprinting methods.

The authors tackled the problem of audio fingerprinting failure in noisy environments by introducing a realistic audio augmentation pipeline and a deep learning model for spectrogram denoising, resulting in improved identification performance for peak-based systems under noisy conditions.

Audio fingerprinting is a well-established solution for song identification from short recording excerpts. Popular methods rely on the extraction of sparse representations, generally spectral peaks, and have proven to be accurate, fast, and scalable to large collections. However, real-world applications of audio identification often happen in noisy environments, which can cause these systems to fail. In this work, we tackle this problem by introducing and releasing a new audio augmentation pipeline that adds noise to music snippets in a realistic way, by stochastically mimicking real-world scenarios. We then propose and release a deep learning model that removes noisy components from spectrograms in order to improve peak-based fingerprinting systems' accuracy. We show that the addition of our model improves the identification performance of commonly used audio fingerprinting systems, even under noisy conditions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes