LGApr 25, 2025

Machine Learning and Statistical Insights into Hospital Stay Durations: The Italian EHR Case

arXiv:2504.18393v11 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

It addresses hospital resource management for healthcare providers, but is incremental as it applies standard ML methods to a new regional dataset.

This study tackled the problem of predicting hospital stay durations in Italy by analyzing EHR data from over 60 facilities, achieving an R2 score of 0.49 with CatBoost.

Length of hospital stay is a critical metric for assessing healthcare quality and optimizing hospital resource management. This study aims to identify factors influencing LoS within the Italian healthcare context, using a dataset of hospitalization records from over 60 healthcare facilities in the Piedmont region, spanning from 2020 to 2023. We explored a variety of features, including patient characteristics, comorbidities, admission details, and hospital-specific factors. Significant correlations were found between LoS and features such as age group, comorbidity score, admission type, and the month of admission. Machine learning models, specifically CatBoost and Random Forest, were used to predict LoS. The highest R2 score, 0.49, was achieved with CatBoost, demonstrating good predictive performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes