LGMay 29, 2025

Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models

arXiv:2505.23378v2h-index: 13INTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses the need for efficient speaker adaptation in health monitoring applications, though it is incremental as it applies existing meta-learning techniques to a specific domain.

The paper tackled the problem of computationally expensive retraining in speaker-dependent models for speech-based health monitoring by reformulating it as a meta-learning task, resulting in transformer-based methods outperforming conventional models on a dataset of 1,185 shift workers with 10,286 recordings.

Speaker-dependent modelling can substantially improve performance in speech-based health monitoring applications. While mixed-effect models are commonly used for such speaker adaptation, they require computationally expensive retraining for each new observation, making them impractical in a production environment. We reformulate this task as a meta-learning problem and explore three approaches of increasing complexity: ensemble-based distance models, prototypical networks, and transformer-based sequence models. Using pre-trained speech embeddings, we evaluate these methods on a large longitudinal dataset of shift workers (N=1,185, 10,286 recordings), predicting time since sleep from speech as a function of fatigue, a symptom commonly associated with ill-health. Our results demonstrate that all meta-learning approaches tested outperformed both cross-sectional and conventional mixed-effects models, with a transformer-based method achieving the strongest performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes