LG CLMay 28

AMNESIA: A Large Scale Medical Unlearning Benchmark Suite with Disease-Informed Analysis

Saeedeh Davoudi, Reihaneh Iranmanesh, Ophir Frieder, Nazli Goharian

arXiv:2605.3059951.0h-index: 6Has Code

Predicted impact top 49% in LG · last 90 daysOriginality Highly original

AI Analysis

This benchmark addresses the critical need for robust evaluation of machine unlearning in medical LLMs, which is crucial for maintaining up-to-date and compliant clinical AI systems.

This paper introduces AMNESIA, a large-scale medical unlearning benchmark suite comprising 70,560 question-answer pairs from 8,820 patient notes across 11 disease categories. The authors used AMNESIA to evaluate four unlearning methods and found that unlearning individual patients negatively impacts knowledge of other patients with the same condition.

Medical knowledge is continuously evolving. This creates a need to update or selectively forget information encoded in already-trained medical LLMs. Machine unlearning aims to remove the influence of specific training data from a model without full retraining. Yet, existing unlearning benchmarks rely on synthetic or small-scale general data, leaving clinical unlearning understudied. We introduce AMNESIA, the first large-scale, open source benchmark for medical unlearning, with 70,560 question-answer pairs from 8,820 patient notes across 11 disease categories. AMNESIA includes both factual questions testing direct recall and reasoning questions testing clinical inference. We use it to evaluate four widely used unlearning methods at both random patient and disease-level, and introduce a new metric for detecting leakage of medical terminology. We show that unlearning individual patients erodes knowledge of others with the same condition, calling for methods that can better separate patients from shared clinical knowledge.

View on arXiv PDF

Similar