Deep Scattering Spectrum
This provides improved signal processing tools for audio classification tasks, though it appears incremental as an extension of existing MFCC methods.
The paper tackles the problem of creating translation-invariant and deformation-stable signal representations by introducing a scattering transform that extends MFCCs through wavelet convolutions and modulus operators, achieving state-of-the-art classification results on GTZAN (musical genre) and TIMIT (phone) databases.
A scattering transform defines a locally translation invariant representation which is stable to time-warping deformations. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation. A frequency transposition invariant representation is obtained by applying a scattering transform along log-frequency. State-the-of-art classification results are obtained for musical genre and phone classification on GTZAN and TIMIT databases, respectively.