Dennis Trujillo

LG
h-index1
4papers
31citations
Novelty34%
AI Score24

4 Papers

LGApr 20, 2022
fairDMS: Rapid Model Training by Data and Model Reuse

Ahsan Ali, Hemant Sharma, Rajkumar Kettimuthu et al.

Extracting actionable information rapidly from data produced by instruments such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming ever more challenging due to high (up to TB/s) data rates. Conventional physics-based information retrieval methods are hard-pressed to detect interesting events fast enough to enable timely focusing on a rare event or correction of an error. Machine learning~(ML) methods that learn cheap surrogate classifiers present a promising alternative, but can fail catastrophically when changes in instrument or sample result in degradation in ML performance. To overcome such difficulties, we present a new data storage and ML model training architecture designed to organize large volumes of data and models so that when model degradation is detected, prior models and/or data can be queried rapidly and a more suitable model retrieved and fine-tuned for new conditions. We show that our approach can achieve up to 100x data labelling speedup compared to the current state-of-the-art, 200x improvement in training speed, and 92x speedup in-terms of end-to-end model updating time.

APJan 4, 2025
Guiding Treatment Strategies: The Role of Adjuvant Anti-Her2 Neu Therapy and Skin/Nipple Involvement in Local Recurrence-Free Survival in Breast Cancer Patients

Joe Omatoi, Abdul M Mohammed, Dennis Trujillo

This study explores how causal inference models, specifically the Linear Non-Gaussian Acyclic Model (LiNGAM), can extract causal relationships between demographic factors, treatments, conditions, and outcomes from observational patient data, enabling insights beyond correlation. Unlike traditional randomized controlled trials (RCTs), which establish causal relationships within narrowly defined populations, our method leverages broader observational data, improving generalizability. Using over 40 features in the Duke MRI Breast Cancer dataset, we found that Adjuvant Anti-Her2 Neu Therapy increased local recurrence-free survival by 169 days, while Skin/Nipple involvement reduced it by 351 days. These findings highlight the therapy's importance for Her2-positive patients and the need for targeted interventions for high-risk cases, informing personalized treatment strategies.

AIJun 10, 2024
A Large Language Model Pipeline for Breast Cancer Oncology

Tristen Pool, Dennis Trujillo

Large language models (LLMs) have demonstrated potential in the innovation of many disciplines. However, how they can best be developed for oncology remains underdeveloped. State-of-the-art OpenAI models were fine-tuned on a clinical dataset and clinical guidelines text corpus for two important cancer treatment factors, adjuvant radiation therapy and chemotherapy, using a novel Langchain prompt engineering pipeline. A high accuracy (0.85+) was achieved in the classification of adjuvant radiation therapy and chemotherapy for breast cancer patients. Furthermore, a confidence interval was formed from observational data on the quality of treatment from human oncologists to estimate the proportion of scenarios in which the model must outperform the original oncologist in its treatment prediction to be a better solution overall as 8.2% to 13.3%. Due to indeterminacy in the outcomes of cancer treatment decisions, future investigation, potentially a clinical trial, would be required to determine if this threshold was met by the models. Nevertheless, with 85% of U.S. cancer patients receiving treatment at local community facilities, these kinds of models could play an important part in expanding access to quality care with outcomes that lie, at minimum, close to a human oncologist.

LGMay 28, 2021
Bridging Data Center AI Systems with Edge Computing for Actionable Information Retrieval

Zhengchun Liu, Ahsan Ali, Peter Kenesei et al.

Extremely high data rates at modern synchrotron and X-ray free-electron laser light source beamlines motivate the use of machine learning methods for data reduction, feature detection, and other purposes. Regardless of the application, the basic concept is the same: data collected in early stages of an experiment, data from past similar experiments, and/or data simulated for the upcoming experiment are used to train machine learning models that, in effect, learn specific characteristics of those data; these models are then used to process subsequent data more efficiently than would general-purpose models that lack knowledge of the specific dataset or data class. Thus, a key challenge is to be able to train models with sufficient rapidity that they can be deployed and used within useful timescales. We describe here how specialized data center AI (DCAI) systems can be used for this purpose through a geographically distributed workflow. Experiments show that although there are data movement cost and service overhead to use remote DCAI systems for DNN training, the turnaround time is still less than 1/30 of using a locally deploy-able GPU.