CL ASOct 13, 2022

On the Utility of Self-supervised Models for Prosody-related Tasks

Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Nigel G. Ward

arXiv:2210.07185v27.965 citationsh-index: 52Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the suitability of SSL models for prosody tasks, which is incremental as it builds on existing SSL methods by applying them to a new domain.

The paper tackled the problem of evaluating self-supervised learning (SSL) speech models for prosody-related tasks, finding that 13 out of 15 SSL models outperformed baselines on prosody tasks and showed good performance on pseudo tasks like reconstruction and prediction.

Self-Supervised Learning (SSL) from speech data has produced models that have achieved remarkable performance in many tasks, and that are known to implicitly represent many aspects of information latently present in speech signals. However, relatively little is known about the suitability of such models for prosody-related tasks or the extent to which they encode prosodic information. We present a new evaluation framework, SUPERB-prosody, consisting of three prosody-related downstream tasks and two pseudo tasks. We find that 13 of the 15 SSL models outperformed the baseline on all the prosody-related tasks. We also show good performance on two pseudo tasks: prosody reconstruction and future prosody prediction. We further analyze the layerwise contributions of the SSL models. Overall we conclude that SSL speech models are highly effective for prosody-related tasks.

View on arXiv PDF Code

Similar