CLSDASApr 25, 2022

Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

arXiv:2204.11550v14 citationsh-index: 23
Originality Synthesis-oriented
AI Analysis

This work addresses a low-resource, domain-specific challenge in medical speech processing for child psychiatry, but it is incremental as it adapts an existing method to a new dataset.

The study tackled the problem of speech detection performance dropping on in-the-wild clinical data with atypical speech, specifically in child-clinician conversations in Danish, and found that using three minutes of conversation for threshold adaptation significantly improved results.

Use of speech models for automatic speech processing tasks can improve efficiency in the screening, analysis, diagnosis and treatment in medicine and psychiatry. However, the performance of pre-processing speech tasks like segmentation and diarization can drop considerably on in-the-wild clinical data, specifically when the target dataset comprises of atypical speech. In this paper we study the performance of a pre-trained speech model on a dataset comprising of child-clinician conversations in Danish with respect to the classification threshold. Since we do not have access to sufficient labelled data, we propose few-instance threshold adaptation, wherein we employ the first minutes of the speech conversation to obtain the optimum classification threshold. Through our work in this paper, we learned that the model with default classification threshold performs worse on children from the patient group. Furthermore, the error rates of the model is directly correlated to the severity of diagnosis in the patients. Lastly, our study on few-instance adaptation shows that three-minutes of clinician-child conversation is sufficient to obtain the optimum classification threshold.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes