SDCLLGASDec 14, 2023

Acoustic models of Brazilian Portuguese Speech based on Neural Transformers

arXiv:2312.09265v12 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This work addresses the need for effective speech-based tools in Brazilian Portuguese, particularly for healthcare applications like respiratory insufficiency detection, though it is incremental as it applies existing Transformer methods to a new language domain.

The authors tackled the problem of building an acoustic model for Brazilian Portuguese speech using a Transformer neural network, pretrained on over 800 hours of unlabeled data, and fine-tuned it for tasks like respiratory insufficiency detection, gender recognition, and age group classification, resulting in significant improvements, including best-reported results for respiratory insufficiency detection and competitive performance for gender recognition compared to English models.

An acoustic model, trained on a significant amount of unlabeled data, consists of a self-supervised learned speech representation useful for solving downstream tasks, perhaps after a fine-tuning of the model in the respective downstream task. In this work, we build an acoustic model of Brazilian Portuguese Speech through a Transformer neural network. This model was pretrained on more than $800$ hours of Brazilian Portuguese Speech, using a combination of pretraining techniques. Using a labeled dataset collected for the detection of respiratory insufficiency in Brazilian Portuguese speakers, we fine-tune the pretrained Transformer neural network on the following tasks: respiratory insufficiency detection, gender recognition and age group classification. We compare the performance of pretrained Transformers on these tasks with that of Transformers without previous pretraining, noting a significant improvement. In particular, the performance of respiratory insufficiency detection obtains the best reported results so far, indicating this kind of acoustic model as a promising tool for speech-as-biomarker approach. Moreover, the performance of gender recognition is comparable to the state of the art models in English.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes