SD LG ASMay 10, 2021

Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method

Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, Hiroshi Saruwatari

arXiv:2105.04079v12.3Has Code

Originality Incremental advance

AI Analysis

This addresses a practical limitation for audio processing applications where models need to work across different sampling frequencies, though it is incremental as it builds on existing DNN-based methods.

The paper tackles the problem of audio source separation models failing on unseen sampling frequencies by proposing a convolution layer that enables a single DNN to handle arbitrary sampling frequencies, showing consistent performance in music source separation experiments.

Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals. Since sampling frequency, one of the audio signal varieties, is usually application specific, the preceding audio source separation model should be able to deal with audio signals of all sampling frequencies specified in the target applications. However, conventional models based on deep neural networks (DNNs) are trained only at the sampling frequency specified by the training data, and there are no guarantees that they work with unseen sampling frequencies. In this paper, we propose a convolution layer capable of handling arbitrary sampling frequencies by a single DNN. Through music source separation experiments, we show that the introduction of the proposed layer enables a conventional audio source separation model to consistently work with even unseen sampling frequencies.

View on arXiv PDF Code

Similar