AP LGApr 22, 2025

Cryptogenic stroke and migraine: using probabilistic independence and machine learning to uncover latent sources of disease from the electronic health record

Joshua W. Betts, John M. Still, Thomas A. Lasko

arXiv:2505.04631v2h-index: 26

Originality Synthesis-oriented

AI Analysis

This work addresses stroke risk prediction for migraine patients, offering a data-driven approach that is incremental in applying existing machine learning methods to a specific clinical domain.

The study tackled predicting cryptogenic stroke risk in migraine patients by extracting probabilistically-independent sources from EHR data, achieving an ROC of 0.771 with a random forest model and identifying pharmacologic interventions as the top risk-minimizing factor.

Migraine is a common but complex neurological disorder that doubles the lifetime risk of cryptogenic stroke (CS). However, this relationship remains poorly characterized, and few clinical guidelines exist to reduce this associated risk. We therefore propose a data-driven approach to extract probabilistically-independent sources from electronic health record (EHR) data and create a 10-year risk-predictive model for CS in migraine patients. These sources represent external latent variables acting on the causal graph constructed from the EHR data and approximate root causes of CS in our population. A random forest model trained on patient expressions of these sources demonstrated good accuracy (ROC 0.771) and identified the top 10 most predictive sources of CS in migraine patients. These sources revealed that pharmacologic interventions were the most important factor in minimizing CS risk in our population and identified a factor related to allergic rhinitis as a potential causative source of CS in migraine patients.

View on arXiv PDF

Similar