SD LG ASApr 18, 2022

Differentiable Time-Frequency Scattering on GPU

John Muradeli, Cyrus Vahidi, Changhong Wang, Han Han, Vincent Lostanlen, Mathieu Lagrange, George Fazekas

arXiv:2204.08269v44.18 citationsh-index: 24Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of integrating biologically plausible audio models into standard evaluation tools for researchers in audio generation and perception, though it is incremental as it improves upon existing methods.

The paper tackled the limitations of prior joint time-frequency scattering implementations by creating a differentiable, fast, and flexible version in Python that works with multiple backends on CPU and GPU, demonstrating its utility in applications like unsupervised manifold learning, supervised classification, and texture resynthesis.

Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time-frequency scattering in Python. Unlike prior implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds.

View on arXiv PDF Code

Similar