ASLGSDNov 4, 2020

A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

arXiv:2011.03115v212 citations
AI Analysis

This work addresses the challenge of unsupervised speech processing for low-resource languages, offering an incremental improvement over prior methods.

The paper tackles the problem of acoustic unit discovery by proposing a hierarchical subspace model that learns language-specific phonetic subspaces and unit embeddings in an unsupervised manner, achieving improved clustering quality and segmentation accuracy compared to existing techniques on TIMIT, Mboshi, and Yoruba datasets.

In this work, we propose a hierarchical subspace model for acoustic unit discovery. In this approach, we frame the task as one of learning embeddings on a low-dimensional phonetic subspace, and simultaneously specify the subspace itself as an embedding on a hyper-subspace. We train the hyper-subspace on a set of transcribed languages and transfer it to the target language. In the target language, we infer both the language and unit embeddings in an unsupervised manner, and in so doing, we simultaneously learn a subspace of units specific to that language and the units that dwell on it. We conduct our experiments on TIMIT and two low-resource languages: Mboshi and Yoruba. Results show that our model outperforms major acoustic unit discovery techniques, both in terms of clustering quality and segmentation accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes