AS CLSep 9, 2024

Estimating the Completeness of Discrete Speech Units

arXiv:2409.06109v210 citationsh-index: 9

AI Analysis

This provides a comprehensive assessment for researchers and practitioners in speech processing on the choice of discrete units, though it is incremental as it verifies existing claims rather than introducing new methods.

The paper tackled the problem of assessing information completeness and accessibility in discrete speech units, showing that speaker information is sufficiently present in HuBERT discrete units and phonetic information in the residual, indicating vector quantization does not achieve disentanglement.

Representing speech with discrete units has been widely used in speech codec and speech generation. However, there are several unverified claims about self-supervised discrete units, such as disentangling phonetic and speaker information with k-means, or assuming information loss after k-means. In this work, we take an information-theoretic perspective to answer how much information is present (information completeness) and how much information is accessible (information accessibility), before and after residual vector quantization. We show a lower bound for information completeness and estimate completeness on discretized HuBERT representations after residual vector quantization. We find that speaker information is sufficiently present in HuBERT discrete units, and that phonetic information is sufficiently present in the residual, showing that vector quantization does not achieve disentanglement. Our results offer a comprehensive assessment on the choice of discrete units, and suggest that a lot more information in the residual should be mined rather than discarded.

View on arXiv PDF

Similar