LGAIOct 13, 2025

Medical Interpretability and Knowledge Maps of Large Language Models

arXiv:2510.11390v11 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This research addresses the problem of understanding how LLMs handle medical knowledge for researchers and practitioners, offering insights to guide fine-tuning and debiasing, though it is incremental as it applies existing interpretability methods to the medical domain.

The study systematically investigates medical-domain interpretability in Large Language Models (LLMs) using four interpretability techniques, revealing that most medical knowledge in Llama3.3-70B is processed in the first half of the model's layers and identifying phenomena such as non-linear encoding of age and clustering of drugs by medical specialty.

We present a systematic study of medical-domain interpretability in Large Language Models (LLMs). We study how the LLMs both represent and process medical knowledge through four different interpretability techniques: (1) UMAP projections of intermediate activations, (2) gradient-based saliency with respect to the model weights, (3) layer lesioning/removal and (4) activation patching. We present knowledge maps of five LLMs which show, at a coarse-resolution, where knowledge about patient's ages, medical symptoms, diseases and drugs is stored in the models. In particular for Llama3.3-70B, we find that most medical knowledge is processed in the first half of the model's layers. In addition, we find several interesting phenomena: (i) age is often encoded in a non-linear and sometimes discontinuous manner at intermediate layers in the models, (ii) the disease progression representation is non-monotonic and circular at certain layers of the model, (iii) in Llama3.3-70B, drugs cluster better by medical specialty rather than mechanism of action, especially for Llama3.3-70B and (iv) Gemma3-27B and MedGemma-27B have activations that collapse at intermediate layers but recover by the final layers. These results can guide future research on fine-tuning, un-learning or de-biasing LLMs for medical tasks by suggesting at which layers in the model these techniques should be applied.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes