ASLGSDJun 14, 2023

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

arXiv:2306.08406v11 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work addresses a gap in using pre-trained models for speech signal generation, offering a method to enhance speech quality, though it appears incremental as it adapts existing models rather than introducing a new paradigm.

The paper tackles the problem of applying pre-trained self-supervised speech models to speech enhancement by proposing a feature normalization technique to align input features, resulting in significant improvements in speech quality compared to baselines.

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have been frequently used as base networks for various pattern classification tasks such as speech recognition. However, not much research has been conducted on applying these types of models to the field of speech signal generation. In this paper, we investigate the feasibility of using pre-trained speech representation models for a downstream speech enhancement task. To alleviate mismatches between the input features of the pre-trained model and the target enhancement model, we adopt a novel feature normalization technique to smoothly link these modules together. Our proposed method enables significant improvements in speech quality compared to baselines when combined with various types of pre-trained speech models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes