LGMLJul 20, 2020

Unsupervised anomaly detection for discrete sequence healthcare data

arXiv:2007.10098v29 citations
Originality Incremental advance
AI Analysis

This addresses fraud detection for insurance companies, but it is incremental as it builds on existing unsupervised methods with specific improvements.

The paper tackles unsupervised anomaly detection for healthcare fraud by proposing deep learning models (LSTM and seq2seq) with an Empirical Distribution Function for score normalization, achieving state-of-the-art results on real patient visit data from Allianz.

Fraud in healthcare is widespread, as doctors could prescribe unnecessary treatments to increase bills. Insurance companies want to detect these anomalous fraudulent bills and reduce their losses. Traditional fraud detection methods use expert rules and manual data processing. Recently, machine learning techniques automate this process, but hand-labeled data is extremely costly and usually out of date. We propose a machine learning model that automates fraud detection in an unsupervised way. Two deep learning approaches include LSTM neural network for prediction next patient visit and a seq2seq model. For normalization of produced anomaly scores, we propose Empirical Distribution Function (EDF) approach. So, the algorithm works with high class imbalance problems. We use real data on sequences of patients' visits data from Allianz company for the validation. The models provide state-of-the-art results for unsupervised anomaly detection for fraud detection in healthcare. Our EDF approach further improves the quality of LSTM model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes