SDAIMar 25, 2021

SubSpectral Normalization for Neural Audio Data Processing

arXiv:2103.13620v135 citations
AI Analysis

This addresses a domain-specific bottleneck in audio processing for researchers and practitioners, though it appears incremental as it adapts normalization techniques to a specific data characteristic.

The paper tackles the problem of handling the unique frequency dimension in audio data for convolutional neural networks by introducing SubSpectral Normalization (SSN), which splits frequency into groups for separate normalization and affine transformation, resulting in improved network performance in audio experiments.

Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency domain input like Mel-Spectrogram has different and unique characteristics in the frequency dimension. Thus, there is a need for a method that allows the 2D convolution layer to handle the frequency dimension differently. In this work, we introduce SubSpectral Normalization (SSN), which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group. SSN also includes an affine transformation that can be applied to each group. Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic. In the experiments with audio data, we observed that SSN can efficiently improve the network's performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes