Unsupervised Machine Learning for Explainable Health Care Fraud Detection
This addresses fraud detection for government health care programs, offering an explainable tool to reduce waste, but it is incremental as it applies existing unsupervised methods to a specific domain.
The paper tackles the problem of detecting health care fraud by providers in Medicare using unsupervised machine learning, identifying patterns consistent with overbilling in inpatient hospitalizations and validating findings with Department of Justice data and case studies.
The US federal government spends more than a trillion dollars per year on health care, largely provided by private third parties and reimbursed by the government. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this paper, we develop novel machine learning tools to identify providers that overbill Medicare, the US federal health insurance program for elderly adults and the disabled. Using large-scale Medicare claims data, we identify patterns consistent with fraud or overbilling among inpatient hospitalizations. Our proposed approach for Medicare fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing reasoning and interpretable insights into the potentially suspicious behavior of the flagged providers. Data from the Department of Justice on providers facing anti-fraud lawsuits and several case studies validate our approach and findings both quantitatively and qualitatively.