AIFeb 26

Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots

Dimitrios P. Panagoulias, Evangelia-Aikaterini Tsichrintzi, Georgios Savvidis, Evridiki Tsoureli-Nikita

arXiv:2602.22973v12.4h-index: 14

Originality Incremental advance

AI Analysis

This work provides a structured method for quantifying the alignment and correction dynamics between AI diagnostic inferences and expert physician validations, which is crucial for improving the traceability and human alignment of clinical decision support systems in safety-critical domains.

This paper introduces a diagnostic alignment framework that compares AI-generated image-based reports with physician-validated outcomes, preserving the AI inference as an immutable state. Evaluating 21 dermatological cases, the framework found an exact primary match rate of 71.4% and a comprehensive concordance rate of 100% when considering structured cross-category and differential overlap, indicating that binary lexical evaluation underestimates clinical alignment.

Human-in-the-loop validation is essential in safety-critical clinical AI, yet the transition between initial model inference and expert correction is rarely analyzed as a structured signal. We introduce a diagnostic alignment framework in which the AI-generated image based report is preserved as an immutable inference state and systematically compared with the physician-validated outcome. The inference pipeline integrates a vision-enabled large language model, BERT- based medical entity extraction, and a Sequential Language Model Inference (SLMI) step to enforce domain-consistent refinement prior to expert review. Evaluation on 21 dermatological cases (21 complete AI physician pairs) em- ployed a four-level concordance framework comprising exact primary match rate (PMR), semantic similarity-adjusted rate (AMR), cross-category alignment, and Comprehensive Concordance Rate (CCR). Exact agreement reached 71.4% and remained unchanged under semantic similarity (t = 0.60), while structured cross-category and differential overlap analysis yielded 100% comprehensive concordance (95% CI: [83.9%, 100%]). No cases demonstrated complete diagnostic divergence. These findings show that binary lexical evaluation substantially un- derestimates clinically meaningful alignment. Modeling expert validation as a structured transformation enables signal-aware quantification of correction dynamics and supports traceable, human aligned evaluation of image based clinical decision support systems.

View on arXiv PDF

Similar