SDCLASOct 25, 2022

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

arXiv:2210.13805v11 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses a specific issue in speech processing for researchers, but it is incremental as it builds on existing masking techniques.

The paper tackled the problem of random masking in speech representation learning by proposing speech-level and phoneme-level masking approaches, resulting in improved performance on phoneme classification and speaker recognition tasks.

Recovering the masked speech frames is widely applied in speech representation learning. However, most of these models use random masking in the pre-training. In this work, we proposed two kinds of masking approaches: (1) speech-level masking, making the model to mask more speech segments than silence segments, (2) phoneme-level masking, forcing the model to mask the whole frames of the phoneme, instead of phoneme pieces. We pre-trained the model via these two approaches, and evaluated on two downstream tasks, phoneme classification and speaker recognition. The experiments demonstrated that the proposed masking approaches are beneficial to improve the performance of speech representation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes