AS LGJun 17, 2022

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

arXiv:2206.08545v217.669 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This addresses the need for flexible and efficient audio upsampling in applications like audio processing and restoration, though it is incremental as it builds on prior work like NU-Wave.

The paper tackles the problem of audio super-resolution models being limited to fixed input-output sampling rate pairs by introducing NU-Wave 2, a diffusion model that generates 48 kHz audio from inputs of various sampling rates with a single model, achieving high-resolution audio with fewer parameters than other models.

Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates. We introduce NU-Wave 2, a diffusion model for neural audio upsampling that enables the generation of 48 kHz audio signals from inputs of various sampling rates with a single model. Based on the architecture of NU-Wave, NU-Wave 2 uses short-time Fourier convolution (STFC) to generate harmonics to resolve the main failure modes of NU-Wave, and incorporates bandwidth spectral feature transform (BSFT) to condition the bandwidths of inputs in the frequency domain. We experimentally demonstrate that NU-Wave 2 produces high-resolution audio regardless of the sampling rate of input while requiring fewer parameters than other models. The official code and the audio samples are available at https://mindslab-ai.github.io/nuwave2.

View on arXiv PDF Code

Similar