CLMay 20

Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification

Rodrigo Morales-Sánchez, Soto Montalvo, Raquel Martínez

arXiv:2605.212560.4

AI Analysis

For clinical NLP practitioners, this work provides a method to improve safety in medical triage by explicitly managing uncertainty, though it is an incremental hybrid approach applied to a specific domain.

The paper addresses the problem of inflated metrics in clinical NLP due to forced deterministic classification on ambiguous instances, proposing a risk-aware hybrid selective classification framework for HIV suspicion identification in Spanish clinical notes. The framework decouples aleatoric and epistemic uncertainty, achieving a highly trustworthy operational domain under strict reliability constraints, while standard methods suffer coverage collapse.

Standard clinical Natural Language Processing (NLP) benchmarks often yield inflated metrics by forcing deterministic classification on ambiguous instances, thereby obscuring the clinical risks of overconfident predictions. To bridge this gap, we propose a risk-aware hybrid selective classification framework, evaluated on early Human Immunodeficiency Virus suspicion identification in Spanish clinical notes. Our dual-verification approach explicitly decouples aleatoric uncertainty through Mondrian conformal prediction and epistemic uncertainty using a Multi-Centroid Mahalanobis Distance veto. Empirical evaluations reveal that standard uncertainty metrics and baseline classifiers are structurally insufficient for safe medical triage, suffering severe coverage collapse when forced to operate under strict reliability constraints. In contrast, by demanding that clinical narratives pass both probabilistic and geometric safeguards, the proposed framework successfully isolates a highly trustworthy operational domain.

View on arXiv PDF

Similar