CL SD ASDec 6, 2023

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang

arXiv:2312.06668v12.17 citationsh-index: 5Has CodeASRU

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of ensuring state-of-the-art speech processing includes declining low-resource languages like Taiwanese Hokkien, though it is incremental as it focuses on evaluation with an existing benchmark.

The authors tackled the problem of evaluating self-supervised speech models on low-resource Taiwanese Hokkien by contributing a 1.5-hour dataset to ML-SUPERB and found that model size does not consistently determine performance, with smaller models sometimes outperforming larger ones, and linguistic alignment is crucial.

Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.

View on arXiv PDF Code

Similar