Equivariance-based self-supervised learning for audio signal recovery from clipped measurements
This addresses the challenge of expensive or unavailable ground truth data in audio signal recovery for applications like audio processing, though it is incremental as it extends self-supervised techniques to a specific non-linear domain.
The paper tackled the problem of recovering audio signals from clipped measurements, a non-linear inverse problem, by proposing an equivariance-based self-supervised learning method that eliminates the need for ground truth data. The result shows that this method performs comparably to fully supervised learning, as demonstrated on simulated and real music signals with varied clipping levels.
In numerous inverse problems, state-of-the-art solving strategies involve training neural networks from ground truth and associated measurement datasets that, however, may be expensive or impossible to collect. Recently, self-supervised learning techniques have emerged, with the major advantage of no longer requiring ground truth data. Most theoretical and experimental results on self-supervised learning focus on linear inverse problems. The present work aims to study self-supervised learning for the non-linear inverse problem of recovering audio signals from clipped measurements. An equivariance-based selfsupervised loss is proposed and studied. Performance is assessed on simulated clipped measurements with controlled and varied levels of clipping, and further reported on standard real music signals. We show that the performance of the proposed equivariance-based self-supervised declipping strategy compares favorably to fully supervised learning while only requiring clipped measurements alone for training.