Courosh Mehanian

LG
h-index14
5papers
25citations
Novelty44%
AI Score37

5 Papers

IVOct 5, 2023Code
How Good Are Synthetic Medical Images? An Empirical Study with Lung Ultrasound

Menghan Yu, Sourabh Kulhare, Courosh Mehanian et al.

Acquiring large quantities of data and annotations is known to be effective for developing high-performing deep learning models, but is difficult and expensive to do in the healthcare context. Adding synthetic training data using generative models offers a low-cost method to deal effectively with the data scarcity challenge, and can also address data imbalance and patient privacy issues. In this study, we propose a comprehensive framework that fits seamlessly into model development workflows for medical image analysis. We demonstrate, with datasets of varying size, (i) the benefits of generative models as a data augmentation method; (ii) how adversarial methods can protect patient privacy via data substitution; (iii) novel performance metrics for these use cases by testing models on real holdout data. We show that training with both synthetic and real data outperforms training with real data alone, and that models trained solely with synthetic data approach their real-only counterparts. Code is available at https://github.com/Global-Health-Labs/US-DCGAN.

LGJan 22
Beyond validation loss: Clinically-tailored optimization metrics improve a model's clinical performance

Charles B. Delahunt, Courosh Mehanian, Daniel E. Shea et al.

A key task in ML is to optimize models at various stages, e.g. by choosing hyperparameters or picking a stopping point. A traditional ML approach is to use validation loss, i.e. to apply the training loss function on a validation set to guide these optimizations. However, ML for healthcare has a distinct goal from traditional ML: Models must perform well relative to specific clinical requirements, vs. relative to the loss function used for training. These clinical requirements can be captured more precisely by tailored metrics. Since many optimization tasks do not require the driving metric to be differentiable, they allow a wider range of options, including the use of metrics tailored to be clinically-relevant. In this paper we describe two controlled experiments which show how the use of clinically-tailored metrics provide superior model optimization compared to validation loss, in the sense of better performance on the clinical task. The use of clinically-relevant metrics for optimization entails some extra effort, to define the metrics and to code them into the pipeline. But it can yield models that better meet the central goal of ML for healthcare: strong performance in the clinic.

LGMay 9, 2024
Driving down Poisson error can offset classification error in clinical tasks

Charles B. Delahunt, Courosh Mehanian, Matthew P. Horning

Medical machine learning algorithms are typically evaluated based on accuracy vs. a clinician-defined ground truth, a reasonable initial choice since trained clinicians are usually better classifiers than ML models. However, this metric does not fully capture the actual clinical task: it neglects the fact that humans, even with perfect accuracy, are subject to non-trivial error from the Poisson statistics of rare events, because clinical protocols often specify a relatively small sample size. For example, to quantitate malaria on a thin blood film a clinician examines only 2000 red blood cells (0.0004 uL), which can yield large Poisson variation in the actual number of parasites present, so that a perfect human's count can differ substantially from the true average load. In contrast, an ML system may be less accurate on an object level, but it may also have the option to examine more blood (e.g. 0.1 uL, or 250x). Then while its parasite identification error is higher, the Poisson variability of its estimate is lower due to larger sample size. To qualify for clinical deployment, an ML system's performance must match current standard of care, typically a very demanding target. To achieve this, it may be possible to offset the ML system's lower accuracy by increasing its sample size to reduce Poisson error, and thus attain the same net clinical performance as a perfectly accurate human limited by smaller sample size. In this paper, we analyse the mathematics of the relationship between Poisson error, classification error, and total error. This mathematical toolkit enables teams optimizing ML systems to leverage a relative strength (larger sample sizes) to offset a relative weakness (classification accuracy). We illustrate the methods with two concrete examples: diagnosis and quantitation of malaria on blood films.

LGAug 5, 2019
Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information

Charles B. Delahunt, Mayoore S. Jaiswal, Matthew P. Horning et al.

Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumber relatively rare parasites. In this work, we describe a complete, fully-automated framework for thin film malaria analysis that applies ML methods, including convolutional neural nets (CNNs), trained on a large and diverse dataset of field-prepared thin blood films. Quantitation and species identification results are close to sufficiently accurate for the concrete needs of drug resistance monitoring and clinical use-cases on field-prepared samples. We focus our methods and our performance metrics on the field use-case requirements. We discuss key issues and important metrics for the application of ML methods to malaria microscopy.

LGJan 26, 2019
Money on the Table: Statistical information ignored by Softmax can improve classifier accuracy

Charles B. Delahunt, Courosh Mehanian, J. Nathan Kutz

Softmax is a standard final layer used in Neural Nets (NNs) to summarize information encoded in the trained NN and return a prediction. However, Softmax leverages only a subset of the class-specific structure encoded in the trained model and ignores potentially valuable information: During training, models encode an array $D$ of class response distributions, where $D_{ij}$ is the distribution of the $j^{th}$ pre-Softmax readout neuron's responses to the $i^{th}$ class. Given a test sample, Softmax implicitly uses only the row of this array $D$ that corresponds to the readout neurons' responses to the sample's true class. Leveraging more of this array $D$ can improve classifier accuracy, because the likelihoods of two competing classes can be encoded in other rows of $D$. To explore this potential resource, we develop a hybrid classifier (Softmax-Pooling Hybrid, $SPH$) that uses Softmax on high-scoring samples, but on low-scoring samples uses a log-likelihood method that pools the information from the full array $D$. We apply $SPH$ to models trained on a vectorized MNIST dataset to varying levels of accuracy. $SPH$ replaces only the final Softmax layer in the trained NN, at test time only. All training is the same as for Softmax. Because the pooling classifier performs better than Softmax on low-scoring samples, $SPH$ reduces test set error by 6% to 23%, using the exact same trained model, whatever the baseline Softmax accuracy. This reduction in error reflects hidden capacity of the trained NN that is left unused by Softmax.