Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging
For sleep staging researchers and practitioners, this work challenges the assumption that Transformers learn complex dependencies, offering a path to simpler, edge-deployable models.
The paper reveals that random, untrained Transformers improve sleep staging by acting as adaptive sequence smoothers, outperforming heuristic smoothing and suggesting that performance gains come from architectural bias rather than learned dependencies.
Automatic sleep staging commonly adopts Transformers under the assumption that they learn complex long-range dependencies. We challenge this view by revealing a neglected property of sleep sequences: strong local temporal continuity. We show that a randomly initialized Transformer, without any training, substantially improves sleep staging performance and consistently outperforms heuristic smoothing. We formalize this effect via a Random Attention Prior Kernel (RAPK), showing that random self-attention acts as an adaptive smoother by balancing global averaging and content-based similarity while preserving stage transitions. Using two metrics, the Local Smoothness Influence Index (LSII) and the Weighted Transition Entropy (WTE), we provide evidence that most performance gains in Transformer-based sleep staging arise from architectural inductive bias rather than parameter learning. Our results suggest that sleep staging can be effectively addressed with structure-driven smoothing mechanisms rather than complex dependency modeling, enabling more efficient and edge-deployable healthcare systems for large-scale physiological monitoring.