Laura Docío-Fernández

h-index15

3papers

26citations

Novelty48%

AI Score30

Ranked #139,176 of 194,257 authors (top 72%)#24,924 in CL (top 81%)

3 Papers

2.2SDJun 2, 2022

Detecting the Severity of Major Depressive Disorder from Speech: A Novel HARD-Training Methodology

Edward L. Campbell, Judith Dineley, Pauline Conde et al.

Major Depressive Disorder (MDD) is a common worldwide mental health issue with high associated socioeconomic costs. The prediction and automatic detection of MDD can, therefore, make a huge impact on society. Speech, as a non-invasive, easy to collect signal, is a promising marker to aid the diagnosis and assessment of MDD. In this regard, speech samples were collected as part of the Remote Assessment of Disease and Relapse in Major Depressive Disorder (RADAR-MDD) research programme. RADAR-MDD was an observational cohort study in which speech and other digital biomarkers were collected from a cohort of individuals with a history of MDD in Spain, United Kingdom and the Netherlands. In this paper, the RADAR-MDD speech corpus was taken as an experimental framework to test the efficacy of a Sequence-to-Sequence model with a local attention mechanism in a two-class depression severity classification paradigm. Additionally, a novel training method, HARD-Training, is proposed. It is a methodology based on the selection of more ambiguous samples for the model training, and inspired by the curriculum learning paradigm. HARD-Training was found to consistently improve - with an average increment of 8.6% - the performance of our classifiers for both of two speech elicitation tasks used and each collection site of the RADAR-MDD speech corpus. With this novel methodology, our Sequence-to-Sequence model was able to effectively detect MDD severity regardless of language. Finally, recognising the need for greater awareness of potential algorithmic bias, we conduct an additional analysis of our results separately for each gender.

4.2CLSep 25, 2024Code

Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition

Andrés Piñeiro-Martín, Carmen García-Mateo, Laura Docío-Fernández et al.

This paper addresses the challenge of integrating low-resource languages into multilingual automatic speech recognition (ASR) systems. We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets, to facilitate the integration of low-resource languages into pre-trained multilingual ASR models within the context of continual multilingual learning. We fine-tune the Whisper multilingual ASR model on five high-resource languages and one low-resource language, employing language-weighted dynamic cross-entropy and data augmentation. The results show a remarkable 6.69% word error rate (WER) reduction for the low-resource language compared to the fine-tuned model without applying our approach, and a 48.86% WER reduction compared to the original Whisper model. In addition, our approach yields an average WER reduction of 3.29% across the six languages, showing no degradation for the high-resource languages.

2.3ASAug 11, 2020

Alzheimer's Dementia Detection from Audio and Text Modalities

Edward L. Campbell, Laura Docío-Fernández, Javier Jiménez Raboso et al.

Automatic detection of Alzheimer's dementia by speech processing is enhanced when features of both the acoustic waveform and the content are extracted. Audio and text transcription have been widely used in health-related tasks, as spectral and prosodic speech features, as well as semantic and linguistic content, convey information about various diseases. Hence, this paper describes the joint work of the GTM-UVIGO research group and acceXible startup to the ADDReSS challenge at INTERSPEECH 2020. The submitted systems aim to detect patterns of Alzheimer's disease from both the patient's voice and message transcription. Six different systems have been built and compared: four of them are speech-based and the other two systems are text-based. The x-vector, i-vector, and statistical speech-based functionals features are evaluated. As a lower speaking fluency is a common pattern in patients with Alzheimer's disease, rhythmic features are also proposed. For transcription analysis, two systems are proposed: one uses GloVe word embedding features and the other uses several features extracted by language modelling. Several intra-modality and inter-modality score fusion strategies are investigated. The performance of single modality and multimodal systems are presented. The achieved results are promising, outperforming the results achieved by the ADDReSS's baseline systems.