CV SD ASAug 11, 2025

Voice Pathology Detection Using Phonation

Sri Raksha Siva, Nived Suthahar, Prakash Boominathan, Uma Ranjan

arXiv:2508.07587v1

Originality Synthesis-oriented

AI Analysis

It addresses the need for noninvasive, automated diagnostic tools for voice disorders, which affect communication and quality of life, but is incremental as it builds on existing methods like RNNs and acoustic features.

This research tackled the problem of detecting voice pathologies by proposing a noninvasive machine learning framework using phonation data, achieving results that support automated diagnosis for early detection.

Voice disorders significantly affect communication and quality of life, requiring an early and accurate diagnosis. Traditional methods like laryngoscopy are invasive, subjective, and often inaccessible. This research proposes a noninvasive, machine learning-based framework for detecting voice pathologies using phonation data. Phonation data from the Saarbrücken Voice Database are analyzed using acoustic features such as Mel Frequency Cepstral Coefficients (MFCCs), chroma features, and Mel spectrograms. Recurrent Neural Networks (RNNs), including LSTM and attention mechanisms, classify samples into normal and pathological categories. Data augmentation techniques, including pitch shifting and Gaussian noise addition, enhance model generalizability, while preprocessing ensures signal quality. Scale-based features, such as Hölder and Hurst exponents, further capture signal irregularities and long-term dependencies. The proposed framework offers a noninvasive, automated diagnostic tool for early detection of voice pathologies, supporting AI-driven healthcare, and improving patient outcomes.

View on arXiv PDF

Similar