CLSDASJan 2, 2023

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

arXiv:2301.00591v360 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of improving discrete speech representations for generative spoken language modeling, offering incremental advancements in robustness and efficiency.

The paper tackles the analysis of discrete self-supervised speech representations for spoken language modeling, finding high correlation with phonemes and proposing methods to reduce unit redundancies, which leads to significant improvements in zero-resource speech metrics like ABX.

This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. First, we start comprehending these units by analyzing them in three axes: interpretation, visualization, and resynthesis. Our analysis finds a high correlation between the speech units to phonemes and phoneme families, while their correlation with speaker or gender is weaker. Additionally, we found redundancies in the extracted units and claim that one reason may be the units' context. Following this analysis, we propose a new, unsupervised metric to measure unit redundancies. Finally, we use this metric to develop new methods that improve the robustness of units' clustering and show significant improvement considering zero-resource speech metrics such as ABX. Code and analysis tools are available under the following link: https://github.com/slp-rl/SLM-Discrete-Representations

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes