SDASMLDec 1, 2017

Utilizing Domain Knowledge in End-to-End Audio Processing

arXiv:1712.00254v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses audio processing efficiency for researchers, but it is incremental as it replicates existing feature-based performance without surpassing it.

The authors tackled the performance gap between end-to-end neural networks and models using high-level audio representations by training initial CNN layers to learn the log-scaled mel-spectrogram transformation, showing similar convergence and performance on the ESC-50 dataset.

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes