Shawn N. Murphy

AI
Semantic Scholar Profile
h-index21
6papers
62citations
Novelty48%
AI Score43

6 Papers

AIFeb 17Code
Optimization Instability in Autonomous Agentic Workflows for Clinical Symptom Detection

Cameron Cagan, Pedram Fard, Jiazi Tian et al.

Autonomous agentic workflows that iteratively refine their own behavior hold considerable promise, yet their failure modes remain poorly characterized. We investigate optimization instability, a phenomenon in which continued autonomous improvement paradoxically degrades classifier performance, using Pythia, an open-source framework for automated prompt optimization. Evaluating three clinical symptoms with varying prevalence (shortness of breath at 23%, chest pain at 12%, and Long COVID brain fog at 3%), we observed that validation sensitivity oscillated between 1.0 and 0.0 across iterations, with severity inversely proportional to class prevalence. At 3% prevalence, the system achieved 95% accuracy while detecting zero positive cases, a failure mode obscured by standard evaluation metrics. We evaluated two interventions: a guiding agent that actively redirected optimization, amplifying overfitting rather than correcting it, and a selector agent that retrospectively identified the best-performing iteration successfully prevented catastrophic failure. With selector agent oversight, the system outperformed expert-curated lexicons on brain fog detection by 331% (F1) and chest pain by 7%, despite requiring only a single natural language term as input. These findings characterize a critical failure mode of autonomous AI systems and demonstrate that retrospective selection outperforms active intervention for stabilization in low-prevalence classification tasks.

LGSep 8, 2023
tSPM+; a high-performance algorithm for mining transitive sequential patterns from clinical data

Jonas Hügel, Ulrich Sax, Shawn N. Murphy et al.

The increasing availability of large clinical datasets collected from patients can enable new avenues for computational characterization of complex diseases using different analytic algorithms. One of the promising new methods for extracting knowledge from large clinical datasets involves temporal pattern mining integrated with machine learning workflows. However, mining these temporal patterns is a computational intensive task and has memory repercussions. Current algorithms, such as the temporal sequence pattern mining (tSPM) algorithm, are already providing promising outcomes, but still leave room for optimization. In this paper, we present the tSPM+ algorithm, a high-performance implementation of the tSPM algorithm, which adds a new dimension by adding the duration to the temporal patterns. We show that the tSPM+ algorithm provides a speed up to factor 980 and a up to 48 fold improvement in memory consumption. Moreover, we present a docker container with an R-package, We also provide vignettes for an easy integration into already existing machine learning workflows and use the mined temporal sequences to identify Post COVID-19 patients and their symptoms according to the WHO definition.

AIFeb 3, 2025
An Agentic AI Workflow for Detecting Cognitive Concerns in Real-world Data

Jiazi Tian, Liqin Wang, Pedram Fard et al.

Early identification of cognitive concerns is critical but often hindered by subtle symptom presentation. This study developed and validated a fully automated, multi-agent AI workflow using LLaMA 3 8B to identify cognitive concerns in 3,338 clinical notes from Mass General Brigham. The agentic workflow, leveraging task-specific agents that dynamically collaborate to extract meaningful insights from clinical notes, was compared to an expert-driven benchmark. Both workflows achieved high classification performance, with F1-scores of 0.90 and 0.91, respectively. The agentic workflow demonstrated improved specificity (1.00) and achieved prompt refinement in fewer iterations. Although both workflows showed reduced performance on validation data, the agentic workflow maintained perfect specificity. These findings highlight the potential of fully automated multi-agent AI workflows to achieve expert-level accuracy with greater efficiency, offering a scalable and cost-effective solution for detecting cognitive concerns in clinical settings.

AISep 30, 2025
The Average Patient Fallacy

Alaleh Azhir, Shawn N. Murphy, Hossein Estiri

Machine learning in medicine is typically optimized for population averages. This frequency weighted training privileges common presentations and marginalizes rare yet clinically critical cases, a bias we call the average patient fallacy. In mixture models, gradients from rare cases are suppressed by prevalence, creating a direct conflict with precision medicine. Clinical vignettes in oncology, cardiology, and ophthalmology show how this yields missed rare responders, delayed recognition of atypical emergencies, and underperformance on vision-threatening variants. We propose operational fixes: Rare Case Performance Gap, Rare Case Calibration Error, a prevalence utility definition of rarity, and clinically weighted objectives that surface ethical priorities. Weight selection should follow structured deliberation. AI in medicine must detect exceptional cases because of their significance.

MLAug 10, 2020
Individualized Prediction of COVID-19 Adverse outcomes with MLHO

Hossein Estiri, Zachary H. Strasser, Shawn N. Murphy

We developed MLHO (pronounced as melo), an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health Outcomes. MLHO implements iterative sequential representation mining, and feature and model selection, for predicting the patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death. It bases this prediction on data from patients' past medical records (before their COVID-19 infection). MLHO's architecture enables a parallel and outcome-oriented model calibration, in which different statistical learning algorithms and vectors of features are simultaneously tested to improve the prediction of health outcomes. Using clinical and demographic data from a large cohort of over 13,000 COVID-19-positive patients, we modeled the four adverse outcomes utilizing about 600 features representing patients' pre-COVID health records and demographics. The mean AUC ROC for mortality prediction was 0.91, while the prediction performance ranged between 0.80 and 0.81 for the ICU, hospitalization, and ventilation. We broadly describe the clusters of features that were utilized in modeling and their relative influence for predicting each outcome. Our results demonstrated that while demographic variables (namely age) are important predictors of adverse outcomes after a COVID-19 infection, the incorporation of the past clinical records are vital for a reliable prediction model. As the COVID-19 pandemic unfolds around the world, adaptable and interpretable machine learning frameworks (like MLHO) are crucial to improve our readiness for confronting the potential future waves of COVID-19, as well as other novel infectious diseases that may emerge.

CVFeb 20, 2020
Brain Age Estimation Using LSTM on Children's Brain MRI

Sheng He, Randy L. Gollub, Shawn N. Murphy et al.

Brain age prediction based on children's brain MRI is an important biomarker for brain health and brain development analysis. In this paper, we consider the 3D brain MRI volume as a sequence of 2D images and propose a new framework using the recurrent neural network for brain age estimation. The proposed method is named as 2D-ResNet18+Long short-term memory (LSTM), which consists of four parts: 2D ResNet18 for feature extraction on 2D images, a pooling layer for feature reduction over the sequences, an LSTM layer, and a final regression layer. We apply the proposed method on a public multisite NIH-PD dataset and evaluate generalization on a second multisite dataset, which shows that the proposed 2D-ResNet18+LSTM method provides better results than traditional 3D based neural network for brain age estimation.