Daniel E. Shea

h-index4

4papers

11citations

Novelty56%

AI Score42

Ranked #57,921 of 194,257 authors (top 30%)#13,172 in LG (top 33%)

4 Papers

7.3IVOct 5, 2023Code

How Good Are Synthetic Medical Images? An Empirical Study with Lung Ultrasound

Menghan Yu, Sourabh Kulhare, Courosh Mehanian et al.

Acquiring large quantities of data and annotations is known to be effective for developing high-performing deep learning models, but is difficult and expensive to do in the healthcare context. Adding synthetic training data using generative models offers a low-cost method to deal effectively with the data scarcity challenge, and can also address data imbalance and patient privacy issues. In this study, we propose a comprehensive framework that fits seamlessly into model development workflows for medical image analysis. We demonstrate, with datasets of varying size, (i) the benefits of generative models as a data augmentation method; (ii) how adversarial methods can protect patient privacy via data substitution; (iii) novel performance metrics for these use cases by testing models on real holdout data. We show that training with both synthetic and real data outperforms training with real data alone, and that models trained solely with synthetic data approach their real-only counterparts. Code is available at https://github.com/Global-Health-Labs/US-DCGAN.

1.4LGJan 22

Beyond validation loss: Clinically-tailored optimization metrics improve a model's clinical performance

Charles B. Delahunt, Courosh Mehanian, Daniel E. Shea et al.

A key task in ML is to optimize models at various stages, e.g. by choosing hyperparameters or picking a stopping point. A traditional ML approach is to use validation loss, i.e. to apply the training loss function on a validation set to guide these optimizations. However, ML for healthcare has a distinct goal from traditional ML: Models must perform well relative to specific clinical requirements, vs. relative to the loss function used for training. These clinical requirements can be captured more precisely by tailored metrics. Since many optimization tasks do not require the driving metric to be differentiable, they allow a wider range of options, including the use of metrics tailored to be clinically-relevant. In this paper we describe two controlled experiments which show how the use of clinically-tailored metrics provide superior model optimization compared to validation loss, in the sense of better performance on the clinical task. The use of clinically-relevant metrics for optimization entails some extra effort, to define the metrics and to code them into the pipeline. But it can yield models that better meet the central goal of ML for healthcare: strong performance in the clinic.

3.1LGMay 26, 2021

Operator Autoencoders: Learning Physical Operations on Encoded Molecular Graphs

Willis Hoke, Daniel Shea, Stephen Casey

Molecular dynamics simulations produce data with complex nonlinear dynamics. If the timestep behavior of such a dynamic system can be represented by a linear operator, future states can be inferred directly without expensive simulations. The use of an autoencoder in combination with a physical timestep operator allows both the relevant structural characteristics of the molecular graphs and the underlying physics of the system to be isolated during the training process. In this work, we develop a pipeline for establishing graph-structured representations of time-series volumetric data from molecular dynamics simulations. We then train an autoencoder to find nonlinear mappings to a latent space where future timesteps can be predicted through application of a linear operator trained in tandem with the autoencoder. Increasing the dimensionality of the autoencoder output is shown to improve the accuracy of the physical timestep operator.

4.3SPApr 3, 2021Code

Extraction of instantaneous frequencies and amplitudes in nonstationary time-series data

Daniel E. Shea, Rajiv Giridharagopal, David S. Ginger et al.

Time-series analysis is critical for a diversity of applications in science and engineering. By leveraging the strengths of modern gradient descent algorithms, the Fourier transform, multi-resolution analysis, and Bayesian spectral analysis, we propose a data-driven approach to time-frequency analysis that circumvents many of the shortcomings of classic approaches, including the extraction of nonstationary signals with discontinuities in their behavior. The method introduced is equivalent to a {\em nonstationary Fourier mode decomposition} (NFMD) for nonstationary and nonlinear temporal signals, allowing for the accurate identification of instantaneous frequencies and their amplitudes. The method is demonstrated on a diversity of time-series data, including on data from cantilever-based electrostatic force microscopy to quantify the time-dependent evolution of charging dynamics at the nanoscale.