CLLGSDASOct 26, 2023

Towards Matching Phones and Speech Representations

arXiv:2310.17558v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses a long-standing challenge in speech processing for improving phone classification, but it appears incremental as it builds on existing self-supervised methods.

The paper tackles the problem of learning phone types from instances by matching cluster centroids to phone embeddings in self-supervised learning, resulting in improved downstream phone classification when combined with losses like APC and CPC.

Learning phone types from phone instances has been a long-standing problem, while still being open. In this work, we revisit this problem in the context of self-supervised learning, and pose it as the problem of matching cluster centroids to phone embeddings. We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones. We then use the matching result to produce pseudo-labels and introduce a new loss function for improving self-supervised representations. Our experiments show that the matching result captures the relationship among phones. Training the new loss function jointly with the regular self-supervised losses, such as APC and CPC, significantly improves the downstream phone classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes