AICLJun 6, 2024

Promoting the Responsible Development of Speech Datasets for Mental Health and Neurological Disorders Research

arXiv:2406.04116v22 citations
Originality Synthesis-oriented
AI Analysis

It addresses data quality and fairness issues in sensitive healthcare domains, offering incremental guidance for researchers and practitioners.

The paper surveyed existing speech datasets for mental health and neurological disorders to identify pitfalls and opportunities, proposing a checklist of ethical desiderata to promote responsible dataset development.

Current research in machine learning and artificial intelligence is largely centered on modeling and performance evaluation, less so on data collection. However, recent research demonstrated that limitations and biases in data may negatively impact trustworthiness and reliability. These aspects are particularly impactful on sensitive domains such as mental health and neurological disorders, where speech data are used to develop AI applications for patients and healthcare providers. In this paper, we chart the landscape of available speech datasets for this domain, to highlight possible pitfalls and opportunities for improvement and promote fairness and diversity. We present a comprehensive list of desiderata for building speech datasets for mental health and neurological disorders and distill it into an actionable checklist focused on ethical concerns to foster more responsible research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes