Thomas A. Lasko

IV
h-index26
21papers
284citations
Novelty41%
AI Score43

21 Papers

IVSep 4, 2022Code
Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography

Thomas Z. Li, Kaiwen Xu, Riqiang Gao et al.

Features learned from single radiologic images are unable to provide information about whether and how much a lesion may be changing over time. Time-dependent features computed from repeated images can capture those changes and help identify malignant lesions by their temporal behavior. However, longitudinal medical imaging presents the unique challenge of sparse, irregular time intervals in data acquisition. While self-attention has been shown to be a versatile and efficient learning mechanism for time series and natural images, its potential for interpreting temporal distance between sparse, irregularly sampled spatial features has not been explored. In this work, we propose two interpretations of a time-distance vision transformer (ViT) by using (1) vector embeddings of continuous time and (2) a temporal emphasis model to scale self-attention weights. The two algorithms are evaluated based on benign versus malignant lung cancer discrimination of synthetic pulmonary nodules and lung screening computed tomography studies from the National Lung Screening Trial (NLST). Experiments evaluating the time-distance ViTs on synthetic nodules show a fundamental improvement in classifying irregularly sampled longitudinal images when compared to standard ViTs. In cross-validation on screening chest CTs from the NLST, our methods (0.785 and 0.786 AUC respectively) significantly outperform a cross-sectional approach (0.734 AUC) and match the discriminative performance of the leading longitudinal medical imaging algorithm (0.779 AUC) on benign versus malignant classification. This work represents the first self-attention-based framework for classifying longitudinal medical images. Our code is available at https://github.com/tom1193/time-distance-transformer.

IVApr 6, 2023Code
Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Thomas Z. Li, John M. Still, Kaiwen Xu et al.

The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures.

MLMay 23, 2022Code
Identifying Patient-Specific Root Causes of Disease

Eric V. Strobl, Thomas A. Lasko

Complex diseases are caused by a multitude of factors that may differ between patients. As a result, hypothesis tests comparing all patients to all healthy controls can detect many significant variables with inconsequential effect sizes. A few highly predictive root causes may nevertheless generate disease within each patient. In this paper, we define patient-specific root causes as variables subject to exogenous "shocks" which go on to perturb an otherwise healthy system and induce disease. In other words, the variables are associated with the exogenous errors of a structural equation model (SEM), and these errors predict a downstream diagnostic label. We quantify predictivity using sample-specific Shapley values. This derivation allows us to develop a fast algorithm called Root Causal Inference for identifying patient-specific root causes by extracting the error terms of a linear SEM and then computing the Shapley value associated with each error. Experiments highlight considerable improvements in accuracy because the method uncovers root causes that may have large effect sizes at the individual level but clinically insignificant effect sizes at the group level. An R implementation is available at github.com/ericstrobl/RCI.

IVMar 4, 2022
Characterizing Renal Structures with 3D Block Aggregate Transformers

Xin Yu, Yucheng Tang, Yinchi Zhou et al.

Efficiently quantifying renal structures can provide distinct spatial context and facilitate biomarker discovery for kidney morphology. However, the development and evaluation of the transformer model to segment the renal cortex, medulla, and collecting system remains challenging due to data inefficiency. Inspired by the hierarchical structures in vision transformer, we propose a novel method using a 3D block aggregation transformer for segmenting kidney components on contrast-enhanced CT scans. We construct the first cohort of renal substructures segmentation dataset with 116 subjects under institutional review board (IRB) approval. Our method yields the state-of-the-art performance (Dice of 0.8467) against the baseline approach of 0.8308 with the data-efficient design. The Pearson R achieves 0.9891 between the proposed method and manual standards and indicates the strong correlation and reproducibility for volumetric analysis. We extend the proposed method to the public KiTS dataset, the method leads to improved accuracy compared to transformer-based approaches. We show that the 3D block aggregation transformer can achieve local communication between sequence representations without modifying self-attention, and it can serve as an accurate and efficient quantification tool for characterizing renal structures.

MLMay 25, 2022
Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model

Eric V. Strobl, Thomas A. Lasko

Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structural equation model. We generalize from the linear setting to the heteroscedastic noise model where $Y = m(X) + \varepsilonσ(X)$ with non-linear functions $m(X)$ and $σ(X)$ representing the conditional mean and mean absolute deviation, respectively. This model preserves identifiability but introduces non-trivial challenges that require a customized algorithm called Generalized Root Causal Inference (GRCI) to extract the error terms correctly. GRCI recovers patient-specific root causes more accurately than existing alternatives.

LGNov 8, 2023
Why Do Probabilistic Clinical Models Fail To Transport Between Sites?

Thomas A. Lasko, Eric V. Strobl, William W. Stead

The rising popularity of artificial intelligence in healthcare is highlighting the problem that a computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. In this perspective, we present common sources for this failure to transport, which we divide into sources under the control of the experimenter and sources inherent to the clinical data-generating process. Of the inherent sources we look a little deeper into site-specific clinical practices that can affect the data distribution, and propose a potential solution intended to isolate the imprint of those practices on the data from the patterns of disease cause and effect that are the usual target of probabilistic clinical models.

MLOct 27, 2022
Sample-Specific Root Causal Inference with Latent Variables

Eric V. Strobl, Thomas A. Lasko

Root causal analysis seeks to identify the set of initial perturbations that induce an unwanted outcome. In prior work, we defined sample-specific root causes of disease using exogenous error terms that predict a diagnosis in a structural equation model. We rigorously quantified predictivity using Shapley values. However, the associated algorithms for inferring root causes assume no latent confounding. We relax this assumption by permitting confounding among the predictors. We then introduce a corresponding procedure called Extract Errors with Latents (EEL) for recovering the error terms up to contamination by vertices on certain paths under the linear non-Gaussian acyclic model. EEL also identifies the smallest sets of dependent errors for fast computation of the Shapley values. The algorithm bypasses the hard problem of estimating the underlying causal graph in both cases. Experiments highlight the superior accuracy and robustness of EEL relative to its predecessors.

CVJun 17, 2022
A Comparative Study of Confidence Calibration in Deep Learning: From Computer Vision to Medical Imaging

Riqiang Gao, Thomas Li, Yucheng Tang et al.

Although deep learning prediction models have been successful in the discrimination of different classes, they can often suffer from poor calibration across challenging domains including healthcare. Moreover, the long-tail distribution poses great challenges in deep learning classification problems including clinical disease prediction. There are approaches proposed recently to calibrate deep prediction in computer vision, but there are no studies found to demonstrate how the representative models work in different challenging contexts. In this paper, we bridge the confidence calibration from computer vision to medical imaging with a comparative study of four high-impact calibration models. Our studies are conducted in different contexts (natural image classification and lung cancer risk estimation) including in balanced vs. imbalanced training sets and in computer vision vs. medical imaging. Our results support key findings: (1) We achieve new conclusions which are not studied under different learning contexts, e.g., combining two calibration models that both mitigate the overconfident prediction can lead to under-confident prediction, and simpler calibration models from the computer vision domain tend to be more generalizable to medical imaging. (2) We highlight the gap between general computer vision tasks and medical imaging prediction, e.g., calibration methods ideal for general computer vision tasks may in fact damage the calibration of medical imaging prediction. (3) We also reinforce previous conclusions in natural image classification settings. We believe that this study has merits to guide readers to choose calibration models and understand gaps between general computer vision and medical imaging domains.

IVSep 28, 2022
UNesT: Local Spatial Representation Learning with Hierarchical Transformer for Efficient Medical Segmentation

Xin Yu, Qi Yang, Yinchi Zhou et al.

Transformer-based models, capable of learning better global dependencies, have recently demonstrated exceptional representation learning capabilities in computer vision and medical image analysis. Transformer reformats the image into separate patches and realizes global communication via the self-attention mechanism. However, positional information between patches is hard to preserve in such 1D sequences, and loss of it can lead to sub-optimal performance when dealing with large amounts of heterogeneous tissues of various sizes in 3D medical image segmentation. Additionally, current methods are not robust and efficient for heavy-duty medical segmentation tasks such as predicting a large number of tissue classes or modeling globally inter-connected tissue structures. To address such challenges and inspired by the nested hierarchical structures in vision transformer, we proposed a novel 3D medical image segmentation method (UNesT), employing a simplified and faster-converging transformer encoder design that achieves local communication among spatially adjacent patch sequences by aggregating them hierarchically. We extensively validate our method on multiple challenging datasets, consisting of multiple modalities, anatomies, and a wide range of tissue classes, including 133 structures in the brain, 14 organs in the abdomen, 4 hierarchical components in the kidneys, inter-connected kidney tumors and brain tumors. We show that UNesT consistently achieves state-of-the-art performance and evaluate its generalizability and data efficiency. Particularly, the model achieves whole brain segmentation task complete ROI with 133 tissue classes in a single network, outperforming prior state-of-the-art method SLANT27 ensembled with 27 networks.

IVJun 23, 2019Code
Fully Automatic Liver Attenuation Estimation Combing CNN Segmentation and Morphological Operations

Yuankai Huo, James G. Terry, Jiachen Wang et al.

Manually tracing regions of interest (ROIs) within the liver is the de facto standard method for measuring liver attenuation on computed tomography (CT) in diagnosing nonalcoholic fatty liver disease (NAFLD). However, manual tracing is resource intensive. To address these limitations and to expand the availability of a quantitative CT measure of hepatic steatosis, we propose the automatic liver attenuation ROI-based measurement (ALARM) method for automated liver attenuation estimation. The ALARM method consists of two major stages: (1) deep convolutional neural network (DCNN)-based liver segmentation and (2) automated ROI extraction. First, liver segmentation was achieved using our previously developed SS-Net. Then, a single central ROI (center-ROI) and three circles ROI (periphery-ROI) were computed based on liver segmentation and morphological operations. The ALARM method is available as an open source Docker container (https://github.com/MASILab/ALARM).246 subjects with 738 abdomen CT scans from the African American-Diabetes Heart Study (AA-DHS) were used for external validation (testing), independent from the training and validation cohort (100 clinically acquired CT abdominal scans).

LGFeb 8, 2024
Unsupervised Discovery of Clinical Disease Signatures Using Probabilistic Independence

Thomas A. Lasko, John M. Still, Thomas Z. Li et al.

Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on the medical record of causal latent sources of disease. We inferred a broad set of 2000 clinical signatures of latent sources from 9195 variables in 269,099 Electronic Health Records. The learned signatures produced better discrimination than the original variables in a lung cancer prediction task unknown to the inference algorithm, predicting 3-year malignancy in patients with no history of cancer before a solitary lung nodule was discovered. More importantly, the signatures' greater explanatory power identified pre-nodule signatures of apparently undiagnosed cancer in many of those patients.

LGJan 13, 2025
A data-driven approach to discover and quantify systemic lupus erythematosus etiological heterogeneity from electronic health records

Marco Barbero Mota, John M. Still, Jorge L. Gamboa et al.

Systemic lupus erythematosus (SLE) is a complex heterogeneous disease with many manifestational facets. We propose a data-driven approach to discover probabilistic independent sources from multimodal imperfect EHR data. These sources represent exogenous variables in the data generation process causal graph that estimate latent root causes of the presence of SLE in the health record. We objectively evaluated the sources against the original variables from which they were discovered by training supervised models to discriminate SLE from negative health records using a reduced set of labelled instances. We found 19 predictive sources with high clinical validity and whose EHR signatures define independent factors of SLE heterogeneity. Using the sources as input patient data representation enables models to provide with rich explanations that better capture the clinical reasons why a particular record is (not) an SLE case. Providers may be willing to trade patient-level interpretability for discrimination especially in challenging cases.

LGOct 15, 2025
A tutorial on discovering and quantifying the effect of latent causal sources of multimodal EHR data

Marco Barbero-Mota, Eric V. Strobl, John M. Still et al.

We provide an accessible description of a peer-reviewed generalizable causal machine learning pipeline to (i) discover latent causal sources of large-scale electronic health records observations, and (ii) quantify the source causal effects on clinical outcomes. We illustrate how imperfect multimodal clinical data can be processed, decomposed into probabilistic independent latent sources, and used to train taskspecific causal models from which individual causal effects can be estimated. We summarize the findings of the two real-world applications of the approach to date as a demonstration of its versatility and utility for medical discovery at scale.

CVSep 18, 2025
Self-supervised learning of imaging and clinical signatures using a multimodal joint-embedding predictive architecture

Thomas Z. Li, Aravind R. Krishnan, Lianrui Zuo et al.

The development of multimodal models for pulmonary nodule diagnosis is limited by the scarcity of labeled data and the tendency for these models to overfit on the training distribution. In this work, we leverage self-supervised learning from longitudinal and multimodal archives to address these challenges. We curate an unlabeled set of patients with CT scans and linked electronic health records from our home institution to power joint embedding predictive architecture (JEPA) pretraining. After supervised finetuning, we show that our approach outperforms an unregularized multimodal model and imaging-only model in an internal cohort (ours: 0.91, multimodal: 0.88, imaging-only: 0.73 AUC), but underperforms in an external cohort (ours: 0.72, imaging-only: 0.75 AUC). We develop a synthetic environment that characterizes the context in which JEPA may underperform. This work innovates an approach that leverages unlabeled multimodal medical archives to improve predictive models and demonstrates its advantages and limitations in pulmonary nodule diagnosis.

CVAug 20, 2025
Lifespan Pancreas Morphology for Control vs Type 2 Diabetes using AI on Largescale Clinical Imaging

Lucas W. Remedios, Chloe Cho, Trent M. Schwartz et al.

Purpose: Understanding how the pancreas changes is critical for detecting deviations in type 2 diabetes and other pancreatic disease. We measure pancreas size and shape using morphological measurements from ages 0 to 90. Our goals are to 1) identify reliable clinical imaging modalities for AI-based pancreas measurement, 2) establish normative morphological aging trends, and 3) detect potential deviations in type 2 diabetes. Approach: We analyzed a clinically acquired dataset of 2533 patients imaged with abdominal CT or MRI. We resampled the scans to 3mm isotropic resolution, segmented the pancreas using automated methods, and extracted 13 morphological pancreas features across the lifespan. First, we assessed CT and MRI measurements to determine which modalities provide consistent lifespan trends. Second, we characterized distributions of normative morphological patterns stratified by age group and sex. Third, we used GAMLSS regression to model pancreas morphology trends in 1350 patients matched for age, sex, and type 2 diabetes status to identify any deviations from normative aging associated with type 2 diabetes. Results: When adjusting for confounders, the aging trends for 10 of 13 morphological features were significantly different between patients with type 2 diabetes and non-diabetic controls (p < 0.05 after multiple comparisons corrections). Additionally, MRI appeared to yield different pancreas measurements than CT using our AI-based method. Conclusions: We provide lifespan trends demonstrating that the size and shape of the pancreas is altered in type 2 diabetes using 675 control patients and 675 diabetes patients. Moreover, our findings reinforce that the pancreas is smaller in type 2 diabetes. Additionally, we contribute a reference of lifespan pancreas morphology from a large cohort of non-diabetic control patients in a clinical setting.

APApr 22, 2025
Cryptogenic stroke and migraine: using probabilistic independence and machine learning to uncover latent sources of disease from the electronic health record

Joshua W. Betts, John M. Still, Thomas A. Lasko

Migraine is a common but complex neurological disorder that doubles the lifetime risk of cryptogenic stroke (CS). However, this relationship remains poorly characterized, and few clinical guidelines exist to reduce this associated risk. We therefore propose a data-driven approach to extract probabilistically-independent sources from electronic health record (EHR) data and create a 10-year risk-predictive model for CS in migraine patients. These sources represent external latent variables acting on the causal graph constructed from the EHR data and approximate root causes of CS in our population. A random forest model trained on patient expressions of these sources demonstrated good accuracy (ROC 0.771) and identified the top 10 most predictive sources of CS in migraine patients. These sources revealed that pharmacologic interventions were the most important factor in minimizing CS risk in our population and identified a factor related to allergic rhinitis as a potential causative source of CS in migraine patients.

MLNov 25, 2021
Generalizing Clinical Trials with Convex Hulls

Eric V. Strobl, Thomas A. Lasko

Randomized clinical trials eliminate confounding but impose strict exclusion criteria that limit recruitment to a subset of the population. Observational datasets are more inclusive but suffer from confounding -- often providing overly optimistic estimates of treatment response over time due to partially optimized physician prescribing patterns. We therefore assume that the unconfounded treatment response lies somewhere in-between the observational estimate before and the observational estimate after treatment assignment. This assumption allows us to extrapolate results from exclusive trials to the broader population by analyzing observational and trial data simultaneously using an algorithm called Optimum in Convex Hulls (OCH). OCH represents the treatment effect either in terms of convex hulls of conditional expectations or convex hulls (also known as mixtures) of conditional densities. The algorithm first learns the component expectations or densities using the observational data and then learns the linear mixing coefficients using trial data in order to approximate the true treatment effect; theory importantly explains why this linear combination should hold. OCH estimates the treatment effect in terms both expectations and densities with state of the art accuracy.

IVJul 25, 2021
Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing Imputation Perspective

Riqiang Gao, Yucheng Tang, Kaiwen Xu et al.

Data from multi-modality provide complementary information in clinical prediction, but missing data in clinical cohorts limits the number of subjects in multi-modal learning context. Multi-modal missing imputation is challenging with existing methods when 1) the missing data span across heterogeneous modalities (e.g., image vs. non-image); or 2) one modality is largely missing. In this paper, we address imputation of missing data by modeling the joint distribution of multi-modal data. Motivated by partial bidirectional generative adversarial net (PBiGAN), we propose a new Conditional PBiGAN (C-PBiGAN) method that imputes one modality combining the conditional knowledge from another modality. Specifically, C-PBiGAN introduces a conditional latent space in a missing imputation framework that jointly encodes the available multi-modal data, along with a class regularization loss on imputed data to recover discriminative information. To our knowledge, it is the first generative adversarial model that addresses multi-modal missing imputation by modeling the joint distribution of image and non-image data. We validate our model with both the national lung screening trial (NLST) dataset and an external clinical validation cohort. The proposed C-PBiGAN achieves significant improvements in lung cancer risk estimation compared with representative imputation methods (e.g., AUC values increase in both NLST (+2.9\%) and in-house dataset (+4.3\%) compared with PBiGAN, p$<$0.05).

MLMay 2, 2021
Synthesized Difference in Differences

Eric V. Strobl, Thomas A. Lasko

We consider estimating the conditional average treatment effect for everyone by eliminating confounding and selection bias. Unfortunately, randomized clinical trials (RCTs) eliminate confounding but impose strict exclusion criteria that prevent sampling of the entire clinical population. Observational datasets are more inclusive but suffer from confounding. We therefore analyze RCT and observational data simultaneously in order to extract the strengths of each. Our solution builds upon Difference in Differences (DD), an algorithm that eliminates confounding from observational data by comparing outcomes before and after treatment administration. DD requires a parallel slopes assumption that may not apply in practice when confounding shifts across time. We instead propose Synthesized Difference in Differences (SDD) that infers the correct (possibly non-parallel) slopes by linearly adjusting a conditional version of DD using additional RCT data. The algorithm achieves state of the art performance across multiple synthetic and real datasets even when the RCT excludes the majority of patients.

LGMar 17, 2020
Semi-supervised Contrastive Learning Using Partial Label Information

Colin B. Hansen, Vishwesh Nath, Diego A. Mesa et al.

In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8.

MLFeb 19, 2014
Efficient Inference of Gaussian Process Modulated Renewal Processes with Application to Medical Event Data

Thomas A. Lasko

The episodic, irregular and asynchronous nature of medical data render them difficult substrates for standard machine learning algorithms. We would like to abstract away this difficulty for the class of time-stamped categorical variables (or events) by modeling them as a renewal process and inferring a probability density over continuous, longitudinal, nonparametric intensity functions modulating that process. Several methods exist for inferring such a density over intensity functions, but either their constraints and assumptions prevent their use with our potentially bursty event streams, or their time complexity renders their use intractable on our long-duration observations of high-resolution events, or both. In this paper we present a new and efficient method for inferring a distribution over intensity functions that uses direct numeric integration and smooth interpolation over Gaussian processes. We demonstrate that our direct method is up to twice as accurate and two orders of magnitude more efficient than the best existing method (thinning). Importantly, the direct method can infer intensity functions over the full range of bursty to memoryless to regular events, which thinning and many other methods cannot. Finally, we apply the method to clinical event data and demonstrate the face-validity of the abstraction, which is now amenable to standard learning algorithms.