SDNov 29, 2016

Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms

Shuhui Qu, Juncheng Li, Wei Dai, Samarjit Das

arXiv:1611.09524v16.02 citations

Originality Synthesis-oriented

AI Analysis

This work addresses audio signal processing for researchers by proposing a data-driven approach, but it is incremental as it applies an existing method to a new domain.

The authors tackled the problem of understanding audio patterns by using a convolutional neural network directly on raw waveforms, discovering that salient patterns can be efficiently extracted through learned nonlinear filters.

One key step in audio signal processing is to transform the raw signal into representations that are efficient for encoding the original information. Traditionally, people transform the audio into spectral representations, as a function of frequency, amplitude and phase transformation. In this work, we take a purely data-driven approach to understand the temporal dynamics of audio at the raw signal level. We maximize the information extracted from the raw signal through a deep convolutional neural network (CNN) model. Our CNN model is trained on the urbansound8k dataset. We discover that salient audio patterns embedded in the raw waveforms can be efficiently extracted through a combination of nonlinear filters learned by the CNN model.

View on arXiv PDF

Similar