LG AIJul 27, 2025

Embeddings to Diagnosis: Latent Fragility under Agentic Perturbations in Clinical LLMs

arXiv:2507.21188v12 citations

Originality Incremental advance

AI Analysis

This addresses safety-critical issues in clinical AI by revealing a gap between surface robustness and semantic stability, though it is incremental as it builds on existing evaluation methods.

The paper tackled the problem of clinical LLMs failing under small input shifts like symptom masking or negation, despite high benchmark performance, by proposing a geometry-aware evaluation framework (LAPD) and introducing Latent Diagnosis Flip Rate (LDFR) to measure representational instability, finding that latent fragility emerges even with minimal changes and validating this on 90 real clinical notes.

LLMs for clinical decision support often fail under small but clinically meaningful input shifts such as masking a symptom or negating a finding, despite high performance on static benchmarks. These reasoning failures frequently go undetected by standard NLP metrics, which are insensitive to latent representation shifts that drive diagnosis instability. We propose a geometry-aware evaluation framework, LAPD (Latent Agentic Perturbation Diagnostics), which systematically probes the latent robustness of clinical LLMs under structured adversarial edits. Within this framework, we introduce Latent Diagnosis Flip Rate (LDFR), a model-agnostic diagnostic signal that captures representational instability when embeddings cross decision boundaries in PCA-reduced latent space. Clinical notes are generated using a structured prompting pipeline grounded in diagnostic reasoning, then perturbed along four axes: masking, negation, synonym replacement, and numeric variation to simulate common ambiguities and omissions. We compute LDFR across both foundation and clinical LLMs, finding that latent fragility emerges even under minimal surface-level changes. Finally, we validate our findings on 90 real clinical notes from the DiReCT benchmark (MIMIC-IV), confirming the generalizability of LDFR beyond synthetic settings. Our results reveal a persistent gap between surface robustness and semantic stability, underscoring the importance of geometry-aware auditing in safety-critical clinical AI.

View on arXiv PDF

Similar