LG CYApr 5, 2023

Building predictive models of healthcare costs with open healthcare data

A. Ravishankar Rao, Subrata Garai, Soumyabrata Dey, Hang Peng

arXiv:2304.02191v13.88 citationsh-index: 16

Originality Synthesis-oriented

AI Analysis

This work addresses the need for cost prediction models to enhance price transparency and efficiency in healthcare, though it is incremental as it applies existing methods to new data.

The researchers tackled the problem of predicting healthcare costs by developing a machine-learning model using de-identified patient data from New York State, achieving an R-square value of 0.76, which outperforms existing literature for similar tasks.

Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. We present an approach to this problem by developing a predictive model using machine-learning techniques. We analyzed de-identified patient data from New York State SPARCS (statewide planning and research cooperative system), consisting of 2.3 million records in 2016. We built models to predict costs from patient diagnoses and demographics. We investigated two model classes consisting of sparse regression and decision trees. We obtained the best performance by using a decision tree with depth 10. We obtained an R-square value of 0.76 which is better than the values reported in the literature for similar problems.

View on arXiv PDF

Similar