New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System
This addresses the challenge of ensuring AI accuracy and safety in healthcare, though it is incremental as it builds on existing monitoring concepts with new metrics.
The paper tackled the problem of monitoring radiology AI classification models in clinical practice by introducing predictive divergence and temporal stability metrics for preemptive alerts, and validated the approach on chest X-ray data to maintain model reliability.
With the increasingly widespread adoption of AI in healthcare, maintaining the accuracy and reliability of AI models in clinical practice has become crucial. In this context, we introduce novel methods for monitoring the performance of radiology AI classification models in practice, addressing the challenges of obtaining real-time ground truth for performance monitoring. We propose two metrics - predictive divergence and temporal stability - to be used for preemptive alerts of AI performance changes. Predictive divergence, measured using Kullback-Leibler and Jensen-Shannon divergences, evaluates model accuracy by comparing predictions with those of two supplementary models. Temporal stability is assessed through a comparison of current predictions against historical moving averages, identifying potential model decay or data drift. This approach was retrospectively validated using chest X-ray data from a single-center imaging clinic, demonstrating its effectiveness in maintaining AI model reliability. By providing continuous, real-time insights into model performance, our system ensures the safe and effective use of AI in clinical decision-making, paving the way for more robust AI integration in healthcare