A Review of Challenges and Opportunities in Machine Learning for Health
It addresses data quality issues in healthcare for ML researchers, but is incremental as a review article.
The paper reviews challenges in applying machine learning to electronic health records, such as poor labeling and data imbalances, and highlights opportunities for the ML community to contribute to healthcare.
Modern electronic health records (EHRs) provide data to answer clinically meaningful questions. The growing data in EHRs makes healthcare ripe for the use of machine learning. However, learning in a clinical setting presents unique challenges that complicate the use of common machine learning methodologies. For example, diseases in EHRs are poorly labeled, conditions can encompass multiple underlying endotypes, and healthy individuals are underrepresented. This article serves as a primer to illuminate these challenges and highlights opportunities for members of the machine learning community to contribute to healthcare.