CLLGNov 30, 2021

What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

arXiv:2111.15512v1628 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of unintended biases in clinical decision support systems, which is critical for improving patient care and fairness in healthcare, though it is incremental as it builds on existing testing frameworks.

The authors tackled the problem of opaque and potentially biased patterns in clinical NLP models for patient outcome prediction by introducing an extendable testing framework to evaluate model behavior regarding patient characteristics like gender, age, and ethnicity. Their evaluation of three models revealed that behavior varies drastically even with the same fine-tuning data, and top-performing models do not always learn medically plausible patterns.

Decision support systems based on clinical notes have the potential to improve patient care by pointing doctors towards overseen risks. Predicting a patient's outcome is an essential part of such systems, for which the use of deep neural networks has shown promising results. However, the patterns learned by these networks are mostly opaque and previous work revealed flaws regarding the reproduction of unintended biases. We thus introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input. The framework helps to understand learned patterns and their influence on model decisions. In this work, we apply it to analyse the change in behavior with regard to the patient characteristics gender, age and ethnicity. Our evaluation of three current clinical NLP models demonstrates the concrete effects of these characteristics on the models' decisions. They show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes