SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model
This addresses the problem of trustworthiness and auditability in clinical sleep staging workflows by providing transparent explanations, though it is incremental in combining existing methods for interpretability.
The paper tackled the lack of auditable reasoning in automated sleep staging by introducing SleepVLM, a rule-grounded vision-language model that stages sleep from polysomnography images and generates clinician-readable rationales based on AASM criteria, achieving Cohen's kappa scores of 0.767 and 0.743 on test sets while matching state-of-the-art performance.
While automated sleep staging has achieved expert-level accuracy, its clinical adoption is hindered by a lack of auditable reasoning. We introduce SleepVLM, a rule-grounded vision-language model (VLM) designed to stage sleep from multi-channel polysomnography (PSG) waveform images while generating clinician-readable rationales based on American Academy of Sleep Medicine (AASM) scoring criteria. Utilizing waveform-perceptual pre-training and rule-grounded supervised fine-tuning, SleepVLM achieved Cohen's kappa scores of 0.767 on an held out test set (MASS-SS1) and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art performance. Expert evaluations further validated the quality of the model's reasoning, with mean scores exceeding 4.0/5.0 for factual accuracy, evidence comprehensiveness, and logical coherence. By coupling competitive performance with transparent, rule-based explanations, SleepVLM may improve the trustworthiness and auditability of automated sleep staging in clinical workflows. To facilitate further research in interpretable sleep medicine, we release MASS-EX, a novel expert-annotated dataset.