LG AIJan 17, 2025

Challenges and recommendations for Electronic Health Records data extraction and preparation for dynamic prediction modelling in hospitalized patients -- a practical guide

Elena Albu, Shan Gao, Pieter Stijnen, Frank E. Rademakers, Bas C T van Bussel, Taya Collyer, Tina Hernandez-Boussard, Laure Wynants, Ben Van Calster

arXiv:2501.10240v24.12 citationsh-index: 85

Originality Synthesis-oriented

AI Analysis

This provides a practical guide for data extraction engineers and researchers in clinical settings to enhance the trustworthiness of predictive models, though it is incremental as it compiles existing challenges without introducing new methods.

The paper tackles the problem of ensuring reliable dynamic prediction models from electronic health records by identifying over forty challenges in data extraction and preparation, and provides actionable recommendations organized into four categories to improve model quality and real-world applicability.

Dynamic predictive modelling using electronic health record (EHR) data has gained significant attention in recent years. The reliability and trustworthiness of such models depend heavily on the quality of the underlying data, which is, in part, determined by the stages preceding the model development: data extraction from EHR systems and data preparation. In this article, we identified over forty challenges encountered during these stages and provide actionable recommendations for addressing them. These challenges are organized into four categories: cohort definition, outcome definition, feature engineering, and data cleaning. This comprehensive list serves as a practical guide for data extraction engineers and researchers, promoting best practices and improving the quality and real-world applicability of dynamic prediction models in clinical settings.

View on arXiv PDF

Similar