SDNov 29, 2016

Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms

arXiv:1611.09524v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses audio signal processing for researchers by proposing a data-driven approach, but it is incremental as it applies an existing method to a new domain.

The authors tackled the problem of understanding audio patterns by using a convolutional neural network directly on raw waveforms, discovering that salient patterns can be efficiently extracted through learned nonlinear filters.

One key step in audio signal processing is to transform the raw signal into representations that are efficient for encoding the original information. Traditionally, people transform the audio into spectral representations, as a function of frequency, amplitude and phase transformation. In this work, we take a purely data-driven approach to understand the temporal dynamics of audio at the raw signal level. We maximize the information extracted from the raw signal through a deep convolutional neural network (CNN) model. Our CNN model is trained on the urbansound8k dataset. We discover that salient audio patterns embedded in the raw waveforms can be efficiently extracted through a combination of nonlinear filters learned by the CNN model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes