Canonical Correlation Analysis for Analyzing Sequences of Medical Billing Codes
This work addresses healthcare cost reduction by predicting surgery needs for patients with diverticulitis, but it is incremental as it applies an existing method (CCA) to a new medical data domain.
The authors tackled the problem of predicting future elective surgery for diverticulitis by using canonical correlation analysis (CCA) to generate features from sequences of medical billing codes, demonstrating that these embeddings capture meaningful relationships and are useful for prediction.
We propose using canonical correlation analysis (CCA) to generate features from sequences of medical billing codes. Applying this novel use of CCA to a database of medical billing codes for patients with diverticulitis, we first demonstrate that the CCA embeddings capture meaningful relationships among the codes. We then generate features from these embeddings and establish their usefulness in predicting future elective surgery for diverticulitis, an important marker in efforts for reducing costs in healthcare.