4.0SPMay 19
Staging by the Book: Automatic Sleep Stage Classification Using Scoring RulesEmil Hardarson, Konstantin Popov, Sigridur Sigurdardottir et al.
Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent research. While machine learning models achieve near-human level agreement with human-scored reference sleep stages, their decisions are typically opaque and not designed to follow clinical scoring rules. We propose a transparent alternative: a deterministic, rule-based sleep staging method that explicitly operationalizes the American Academy of Sleep Medicine's (AASM) scoring logic as executable code, coupled with epoch-level natural-language justifications derived from an explanation trace. We evaluate the approach on 50 polysomnography recordings with a 10-scorer majority-vote consensus as reference. Across all recordings, the method agreed with the majority-vote reference in 60.5% of epochs ($κ=0.42$), with substantially higher agreement on a dataset used during development (77.1%, $κ=0.61$). Agreement with the reference was highest for sleep stage N2 (recall 83.5%) and moderate for sleep stage R (recall 68.7%), while Wake and N1 recall were low. Despite lower agreement with the reference than contemporary deep learning models, the method provides deterministic decisions and natural language explanations aligned with AASM scoring rules, making it a complementary tool for auditing, debugging, and governing deep learning-based sleep staging.
3.4LGMar 16
Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time-Series ClassificationEmil Hardarson, Luka Biedebach, Ómar Bessi Ómarsson et al.
Applying machine learning to sensitive time-series data is often bottlenecked by the iteration loop: Performance depends strongly on preprocessing and architecture, yet training often has to run on-premise under strict data-local constraints. This is a common problem in healthcare and other privacy-constrained domains (e.g., a hospital developing deep learning models on patient EEG). This bottleneck is particularly challenging in multimodal fusion, where sensor modalities must be individually preprocessed and then combined. LLM-guided neural architecture search (NAS) can automate this exploration, but most existing workflows assume cloud execution or access to data-derived artifacts that cannot be exposed. We present a novel data-local, LLM-guided search framework that handles candidate pipelines remotely while executing all training and evaluation locally under a fixed protocol. The controller observes only trial-level summaries, such as pipeline descriptors, metrics, learning-curve statistics, and failure logs, without ever accessing raw samples or intermediate feature representations. Our framework targets multiclass, multimodal learning via one-vs-rest binary experts per class and modality, a lightweight fusion MLP, and joint search over expert architectures and modality-specific preprocessing. We evaluate our method on two regimes: UEA30 (public multivariate time-series classification dataset) and SleepEDFx sleep staging (heterogeneous clinical modalities such as EEG, EOG, and EMG). The results show that the modular baseline model is strong, and the LLM-guided NAS further improves it. Notably, our method finds models that perform within published ranges across most benchmark datasets. Across both settings, our method reduces manual intervention by enabling unattended architecture search while keeping sensitive data on-premise.