LGSep 14, 2022
Metrics to guide development of machine learning algorithms for malaria diagnosisCharles B. Delahunt, Noni Gachuhi, Matthew P. Horning
Automated malaria diagnosis is a difficult but high-value target for machine learning (ML), and effective algorithms could save many thousands of children's lives. However, current ML efforts largely neglect crucial use case constraints and are thus not clinically useful. Two factors in particular are crucial to developing algorithms translatable to clinical field settings: (i) Clear understanding of the clinical needs that ML solutions must accommodate; and (ii) task-relevant metrics for guiding and evaluating ML models. Neglect of these factors has seriously hampered past ML work on malaria, because the resulting algorithms do not align with clinical needs. In this paper we address these two issues in the context of automated malaria diagnosis via microscopy on Giemsa-stained blood films. First, we describe why domain expertise is crucial to effectively apply ML to malaria, and list technical documents and other resources that provide this domain knowledge. Second, we detail performance metrics tailored to the clinical requirements of malaria diagnosis, to guide development of ML models and evaluate model performance through the lens of clinical needs (versus a generic ML lens). We highlight the importance of a patient-level perspective, interpatient variability, false positive rates, limit of detection, and different types of error. We also discuss reasons why ROC curves, AUC, and F1, as commonly used in ML work, are poorly suited to this context. These findings also apply to other diseases involving parasite loads, including neglected tropical diseases (NTDs) such as schistosomiasis.
LGJan 22
Beyond validation loss: Clinically-tailored optimization metrics improve a model's clinical performanceCharles B. Delahunt, Courosh Mehanian, Daniel E. Shea et al.
A key task in ML is to optimize models at various stages, e.g. by choosing hyperparameters or picking a stopping point. A traditional ML approach is to use validation loss, i.e. to apply the training loss function on a validation set to guide these optimizations. However, ML for healthcare has a distinct goal from traditional ML: Models must perform well relative to specific clinical requirements, vs. relative to the loss function used for training. These clinical requirements can be captured more precisely by tailored metrics. Since many optimization tasks do not require the driving metric to be differentiable, they allow a wider range of options, including the use of metrics tailored to be clinically-relevant. In this paper we describe two controlled experiments which show how the use of clinically-tailored metrics provide superior model optimization compared to validation loss, in the sense of better performance on the clinical task. The use of clinically-relevant metrics for optimization entails some extra effort, to define the metrics and to code them into the pipeline. But it can yield models that better meet the central goal of ML for healthcare: strong performance in the clinic.
LGMay 9, 2024
Driving down Poisson error can offset classification error in clinical tasksCharles B. Delahunt, Courosh Mehanian, Matthew P. Horning
Medical machine learning algorithms are typically evaluated based on accuracy vs. a clinician-defined ground truth, a reasonable initial choice since trained clinicians are usually better classifiers than ML models. However, this metric does not fully capture the actual clinical task: it neglects the fact that humans, even with perfect accuracy, are subject to non-trivial error from the Poisson statistics of rare events, because clinical protocols often specify a relatively small sample size. For example, to quantitate malaria on a thin blood film a clinician examines only 2000 red blood cells (0.0004 uL), which can yield large Poisson variation in the actual number of parasites present, so that a perfect human's count can differ substantially from the true average load. In contrast, an ML system may be less accurate on an object level, but it may also have the option to examine more blood (e.g. 0.1 uL, or 250x). Then while its parasite identification error is higher, the Poisson variability of its estimate is lower due to larger sample size. To qualify for clinical deployment, an ML system's performance must match current standard of care, typically a very demanding target. To achieve this, it may be possible to offset the ML system's lower accuracy by increasing its sample size to reduce Poisson error, and thus attain the same net clinical performance as a perfectly accurate human limited by smaller sample size. In this paper, we analyse the mathematics of the relationship between Poisson error, classification error, and total error. This mathematical toolkit enables teams optimizing ML systems to leverage a relative strength (larger sample sizes) to offset a relative weakness (classification accuracy). We illustrate the methods with two concrete examples: diagnosis and quantitation of malaria on blood films.
LGAug 5, 2019
Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary InformationCharles B. Delahunt, Mayoore S. Jaiswal, Matthew P. Horning et al.
Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumber relatively rare parasites. In this work, we describe a complete, fully-automated framework for thin film malaria analysis that applies ML methods, including convolutional neural nets (CNNs), trained on a large and diverse dataset of field-prepared thin blood films. Quantitation and species identification results are close to sufficiently accurate for the concrete needs of drug resistance monitoring and clinical use-cases on field-prepared samples. We focus our methods and our performance metrics on the field use-case requirements. We discuss key issues and important metrics for the application of ML methods to malaria microscopy.