LGSDASOct 26, 2021

TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining

arXiv:2110.13492v531 citations
Originality Incremental advance
AI Analysis

This work addresses audio quality enhancement for applications like telecommunications, but it is incremental as it builds on existing TFiLM and UNet architectures.

The authors tackled bandwidth extension for audio signals by proposing TUNet, a block-online model based on transformers and self-supervised pretraining, which outperformed recent baselines on the VCTK dataset in both intrusive and non-intrusive metrics.

We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in both intrusive and non-intrusive metrics. Pretraining and filter augmentation also help stabilize and enhance the overall performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes