Challenges in Explaining Pretrained Clinical Text Classifiers
For researchers and practitioners in clinical NLP, it highlights the inadequacy of current explanation methods for complex medical texts.
The paper identifies limitations of token-level and perturbation-based explanation methods (LIME, SHAP) on clinical text classifiers, showing overemphasis on non-informative tokens, instability, and high-confidence predictions for incoherent inputs in a length-of-stay prediction task.
Explaining the predictions of neural models in clinical NLP remains a significant challenge, especially for complex tasks involving long, unstructured medical texts. While post-hoc methods like LIME and SHAP are widely used, they often fall short when applied to clinical narratives. In this paper, we identify core limitations of token-level and perturbation-based explanation techniques through targeted demonstra- tions on a hospital length-of-stay prediction task. Our findings reveal issues such as overemphasis on non-informative tokens, instability in at- tributions, and high-confidence predictions for incoherent input variants. These results underscore the need for explanation strategies that are clin- ically meaningful, semantically grounded, and robust to linguistic noise.