Daniel Capurro

6papers

395citations

Novelty38%

AI Score40

Ranked #99,585 of 205,806 authors (top 48%)#21,964 in LG (top 52%)

6 Papers

LGSep 25, 2023

Explainable Machine Learning for ICU Readmission Prediction

Alex G. C. de Sá, Daniel Gould, Anna Fedyukova et al.

The intensive care unit (ICU) comprises a complex hospital environment, where decisions made by clinicians have a high level of risk for the patients' lives. A comprehensive care pathway must then be followed to reduce p complications. Uncertain, competing and unplanned aspects within this environment increase the difficulty in uniformly implementing the care pathway. Readmission contributes to this pathway's difficulty, occurring when patients are admitted again to the ICU in a short timeframe, resulting in high mortality rates and high resource utilisation. Several works have tried to predict readmission through patients' medical information. Although they have some level of success while predicting readmission, those works do not properly assess, characterise and understand readmission prediction. This work proposes a standardised and explainable machine learning pipeline to model patient readmission on a multicentric database (i.e., the eICU cohort with 166,355 patients, 200,859 admissions and 6,021 readmissions) while validating it on monocentric (i.e., the MIMIC IV cohort with 382,278 patients, 523,740 admissions and 5,984 readmissions) and multicentric settings. Our machine learning pipeline achieved predictive performance in terms of the area of the receiver operating characteristic curve (AUC) up to 0.7 with a Random Forest classification model, yielding an overall good calibration and consistency on validation sets. From explanations provided by the constructed models, we could also derive a set of insightful conclusions, primarily on variables related to vital signs and blood tests (e.g., albumin, blood urea nitrogen and hemoglobin levels), demographics (e.g., age, and admission height and weight), and ICU-associated variables (e.g., unit type). These insights provide an invaluable source of information during clinicians' decision-making while discharging ICU patients.

CLJan 26, 2023

Improving Text-based Early Prediction by Distillation from Privileged Time-Series Text

Jinghui Liu, Daniel Capurro, Anthony Nguyen et al.

Modeling text-based time-series to make prediction about a future event or outcome is an important task with a wide range of applications. The standard approach is to train and test the model using the same input window, but this approach neglects the data collected in longer input windows between the prediction time and the final outcome, which are often available during training. In this study, we propose to treat this neglected text as privileged information available during training to enhance early prediction modeling through knowledge distillation, presented as Learning using Privileged tIme-sEries Text (LuPIET). We evaluate the method on clinical and social media text, with four clinical prediction tasks based on clinical notes and two mental health prediction tasks based on social media posts. Our results show LuPIET is effective in enhancing text-based early predictions, though one may need to consider choosing the appropriate text representation and windows for privileged text to achieve optimal performance. Compared to two other methods using transfer learning and mixed training, LuPIET offers more stable improvements over the baseline, standard training. As far as we are concerned, this is the first study to examine learning using privileged information for time-series in the NLP context.

26.5LGMay 11

Generating synthetic electronic health record data using agent-based models to evaluate machine learning robustness under mass casualty incidents

Roben Delos Reyes, Daniel Capurro, Nicholas Geard

ML models in healthcare are typically evaluated using curated real-world EHR data. A key limitation of such evaluations is that they may fail to assess the robustness of ML models to changes in the data at deployment, which is a common issue because EHR data used for ML model development cannot capture all such changes. Mass casualty incidents (MCIs) caused by disasters are critical instances where this will be an issue, as they induce rare, uncertain, and novel changes to routine system conditions. Because real-world EHR data from MCIs are often limited or unavailable, assessing ML robustness under such conditions before deployment remains challenging. Here, we propose an agent-based modelling approach for generating synthetic EHR data to evaluate the robustness of ML models under MCI scenarios. We use real-world EHR data to develop and calibrate an agent-based model (ABM) of an emergency department (ED) that explicitly models patient arrivals, resource capacity, and clinical workflow. By changing these system conditions to reflect plausible MCI scenarios, the ED model generates synthetic versions of the real-world EHR data that exhibit shifts in system behaviour. Using these synthetic data, we test ML models for predicting length of stay. We observed consistent declines in recall under MCI conditions relative to baseline system conditions, resulting in an increase in the number of patients with prolonged length of stay that were missed by the ML models. These results highlight the impact of changes in system conditions on patient outcomes, EHR data, and ML model performance. Our work establishes ABM-based synthetic EHR data generation as a proactive and systematic approach for evaluating the robustness of ML models under MCI or other system conditions not captured in real-world EHR data, supporting the safer and more effective deployment of ML models in healthcare systems.

LGJul 3, 2021

Quantifying machine learning-induced overdiagnosis in sepsis

Anna Fedyukova, Douglas Pires, Daniel Capurro

The proliferation of early diagnostic technologies, including self-monitoring systems and wearables, coupled with the application of these technologies on large segments of healthy populations may significantly aggravate the problem of overdiagnosis. This can lead to unwanted consequences such as overloading health care systems and overtreatment, with potential harms to healthy individuals. The advent of machine-learning tools to assist diagnosis -- while promising rapid and more personalised patient management and screening -- might contribute to this issue. The identification of overdiagnosis is usually post hoc and demonstrated after long periods (from years to decades) and costly randomised control trials. In this paper, we present an innovative approach that allows us to preemptively detect potential cases of overdiagnosis during predictive model development. This approach is based on the combination of labels obtained from a prediction model and clustered medical trajectories, using sepsis in adults as a test case. This is one of the first attempts to quantify machine-learning induced overdiagnosis and we believe will serves as a platform for further development, leading to guidelines for safe deployment of computational diagnostic tools.

HCMay 19, 2021

Dark Patterns, Electronic Medical Records, and the Opioid Epidemic

Daniel Capurro, Eduardo Velloso

Dark patterns have emerged as a set of methods to exploit cognitive biases to trick users to make decisions that are more aligned with a third party than to their own. These patterns can have consequences that might range from inconvenience to global disasters. We present a case of a drug company and an electronic medical record vendor who colluded to modify the medical record's interface to induce clinicians to increase the prescription of extended-release opioids, a class of drugs that has a high potential for addiction and has caused almost half a million additional deaths in the past two decades. Through this case, we present the use and effects of dark patterns in healthcare, discuss the current challenges, and offer some recommendations on how to address this pressing issue.

CVOct 20, 2020

A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images

Pablo Messina, Pablo Pino, Denis Parra et al.

Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical images, with emphasis on methods using deep neural networks, with respect to: (1) Datasets, (2) Architecture Design, (3) Explainability and (4) Evaluation Metrics. Our survey identifies interesting developments, but also remaining challenges. Among them, the current evaluation of generated reports is especially weak, since it mostly relies on traditional Natural Language Processing (NLP) metrics, which do not accurately capture medical correctness.