CL LGSep 28, 2022

Natural Language Processing Methods to Identify Oncology Patients at High Risk for Acute Care with Clinical Notes

Claudio Fanconi, Marieke van Buchem, Tina Hernandez-Boussard

arXiv:2209.13860v20.310 citationsh-index: 64Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses risk prediction for oncology patients to improve healthcare outcomes, but it is incremental as it shows NLP methods are competitive but not superior to existing structured data approaches.

This paper tackled the problem of predicting acute care use in oncology patients starting chemotherapy by comparing natural language processing models on clinical notes to standard structured health data models, finding that structured data models slightly outperformed NLP models with C-statistics of 0.748 vs. 0.730 for language features and 0.702 for a transformer-based model.

Clinical notes are an essential component of a health record. This paper evaluates how natural language processing (NLP) can be used to identify the risk of acute care use (ACU) in oncology patients, once chemotherapy starts. Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex. This paper explores the use of free-text notes for the prediction of ACU instead of SHD. Deep Learning models were compared to manually engineered language features. Results show that SHD models minimally outperform NLP models; an l1-penalised logistic regression with SHD achieved a C-statistic of 0.748 (95%-CI: 0.735, 0.762), while the same model with language features achieved 0.730 (95%-CI: 0.717, 0.745) and a transformer-based model achieved 0.702 (95%-CI: 0.688, 0.717). This paper shows how language models can be used in clinical applications and underlines how risk bias is different for diverse patient groups, even using only free-text data.

View on arXiv PDF Code

Similar