ASLGSDNov 21, 2021

Active Restoration of Lost Audio Signals Using Machine Learning and Latent Information

arXiv:2111.10891v41 citations
Originality Incremental advance
AI Analysis

This work addresses audio signal restoration for digital audio processing, but it appears incremental as it builds on existing techniques like steganography and machine learning without introducing a fundamentally new paradigm.

The paper tackles the problem of reconstructing lost or corrupt audio segments by proposing a novel framework that combines steganography, halftoning (dithering), and machine learning regressors, termed HCR, which outperforms existing methods in terms of SNR, ODG, and Hansen's audio quality metric.

Digital audio signal reconstruction of a lost or corrupt segment using deep learning algorithms has been explored intensively in recent years. Nevertheless, prior traditional methods with linear interpolation, phase coding and tone insertion techniques are still in vogue. However, we found no research work on reconstructing audio signals with the fusion of dithering, steganography, and machine learning regressors. Therefore, this paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow and deep learning methods. The results (including comparing the SPAIN, Autoregressive, deep learning-based, graph-based, and other methods) are evaluated with three different metrics. The observations from the results show that the proposed solution is effective and can enhance the reconstruction of audio signals performed by the side information (e.g., Latent representation) steganography provides. Moreover, this paper proposes a novel framework for reconstruction from heavily compressed embedded audio data using halftoning (i.e., dithering) and machine learning, which we termed the HCR (halftone-based compression and reconstruction). This work may trigger interest in optimising this approach and/or transferring it to different domains (i.e., image reconstruction). Compared to existing methods, we show improvement in the inpainting performance in terms of signal-to-noise ratio (SNR), the objective difference grade (ODG) and Hansen's audio quality metric. In particular, our proposed framework outperformed the learning-based methods (D2WGAN and SG) and the traditional statistical algorithms (e.g., SPAIN, TDC, WCP).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes