Matthew McDermott

LG
h-index55
18papers
1,082citations
Novelty43%
AI Score53

18 Papers

ROJun 14, 2023Code
ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching

Matthew McDermott, Jason Rife

Distribution-to-Distribution (D2D) point cloud registration algorithms are fast, interpretable, and perform well in unstructured environments. Unfortunately, existing strategies for predicting solution error for these methods are overly optimistic, particularly in regions containing large or extended physical objects. In this paper we introduce the Iterative Closest Ellipsoidal Transform (ICET), a novel 3D LIDAR scan-matching algorithm that re-envisions NDT in order to provide robust accuracy prediction from first principles. Like NDT, ICET subdivides a LIDAR scan into voxels in order to analyze complex scenes by considering many smaller local point distributions, however, ICET assesses the voxel distribution to distinguish random noise from deterministic structure. ICET then uses a weighted least-squares formulation to incorporate this noise/structure distinction into computing a localization solution and predicting the solution-error covariance. In order to demonstrate the reasonableness of our accuracy predictions, we verify 3D ICET in three LIDAR tests involving real-world automotive data, high-fidelity simulated trajectories, and simulated corner-case scenes. For each test, ICET consistently performs scan matching with sub-centimeter accuracy. This level of accuracy, combined with the fact that the algorithm is fully interpretable, make it well suited for safety-critical transportation applications. Code is available at https://github.com/mcdermatt/ICET

LGFeb 1, 2023
Using Machine Learning to Develop Smart Reflex Testing Protocols

Matthew McDermott, Anand Dighe, Peter Szolovits et al. · harvard

Objective: Reflex testing protocols allow clinical laboratories to perform second line diagnostic tests on existing specimens based on the results of initially ordered tests. Reflex testing can support optimal clinical laboratory test ordering and diagnosis. In current clinical practice, reflex testing typically relies on simple "if-then" rules; however, this limits their scope since most test ordering decisions involve more complexity than a simple rule will allow. Here, using the analyte ferritin as an example, we propose an alternative machine learning-based approach to "smart" reflex testing with a wider scope and greater impact than traditional rule-based approaches. Methods: Using patient data, we developed a machine learning model to predict whether a patient getting CBC testing will also have ferritin testing ordered, consider applications of this model to "smart" reflex testing, and evaluate the model by comparing its performance to possible rule-based approaches. Results: Our underlying machine learning models performed moderately well in predicting ferritin test ordering and demonstrated greater suitability to reflex testing than rule-based approaches. Using chart review, we demonstrate that our model may improve ferritin test ordering. Finally, as a secondary goal, we demonstrate that ferritin test results are missing not at random (MNAR), a finding with implications for unbiased imputation of missing test results. Conclusions: Machine learning may provide a foundation for new types of reflex testing with enhanced benefits for clinical diagnosis and laboratory utilization management.

LGMay 14Code
Croissant Baker: Metadata Generation for Discoverable, Governable, and Reusable ML Datasets

Rafi Al Attrach, Rajna Fani, Sebastian Lobentanzer et al.

Croissant has emerged as the metadata standard for machine learning datasets, providing a structured, JSON-LD-based format that makes dataset discovery, automated ingestion, and reproducible analysis machine-checkable across ML platforms. Adoption has accelerated, and NeurIPS now requires Croissant metadata in every submission to its dataset tracks. Yet in practice Croissant generation usually starts with uploading data to a public platform, a path infeasible for governed and large local repositories that hold much of the high-value data ML increasingly relies on. We release Croissant Baker, a local-first, open-source command-line tool that generates validated Croissant metadata directly from a dataset directory through a modular handler registry. We evaluate Croissant Baker on over 140 datasets, scaling to MIMIC-IV at 886 million rows and 374 Parquet files. On held-out comparisons against producer-authored or standards-derived ground truth, Croissant Baker reaches 97-100% agreement across multiple domains.

ROAug 1, 2022
Mitigating Shadows in Lidar Scan Matching using Spherical Voxels

Matthew McDermott, Jason Rife

In this paper we propose an approach to mitigate shadowing errors in Lidar scan matching, by introducing a preprocessing step based on spherical gridding. Because the grid aligns with the Lidar beam, it is relatively easy to eliminate shadow edges which cause systematic errors in Lidar scan matching. As we show through simulation, our proposed algorithm provides better results than ground-plane removal, the most common existing strategy for shadow mitigation. Unlike ground plane removal, our method applies to arbitrary terrains (e.g. shadows on urban walls, shadows in hilly terrain) while retaining key Lidar points on the ground that are critical for estimating changes in height, pitch, and roll. Our preprocessing algorithm can be used with a range of scan-matching methods; however, for voxel-based scan matching methods, it provides additional benefits by reducing computation costs and more evenly distributing Lidar points among voxels.

ROJul 29, 2022
Enhanced Laser-Scan Matching with Online Error Estimation for Highway and Tunnel Driving

Matthew McDermott, Jason Rife

Lidar data can be used to generate point clouds for the navigation of autonomous vehicles or mobile robotics platforms. Scan matching, the process of estimating the rigid transformation that best aligns two point clouds, is the basis for lidar odometry, a form of dead reckoning. Lidar odometry is particularly useful when absolute sensors, like GPS, are not available. Here we propose the Iterative Closest Ellipsoidal Transform (ICET), a scan matching algorithm which provides two novel improvements over the current state-of-the-art Normal Distributions Transform (NDT). Like NDT, ICET decomposes lidar data into voxels and fits a Gaussian distribution to the points within each voxel. The first innovation of ICET reduces geometric ambiguity along large flat surfaces by suppressing the solution along those directions. The second innovation of ICET is to infer the output error covariance associated with the position and orientation transformation between successive point clouds; the error covariance is particularly useful when ICET is incorporated into a state-estimation routine such as an extended Kalman filter. We constructed a simulation to compare the performance of ICET and NDT in 2D space both with and without geometric ambiguity and found that ICET produces superior estimates while accurately predicting solution accuracy.

CVNov 8, 2022
DNN Filter for Bias Reduction in Distribution-to-Distribution Scan Matching

Matthew McDermott, Jason Rife

Distribution-to-distribution (D2D) point cloud registration techniques such as the Normal Distributions Transform (NDT) can align point clouds sampled from unstructured scenes and provide accurate bounds of their own solution error covariance -- an important feature for safety-of-life navigation tasks. D2D methods rely on the assumption of a static scene and are therefore susceptible to bias from range-shadowing, self-occlusion, moving objects, and distortion artifacts as the recording device moves between frames. Deep Learning-based approaches can achieve higher accuracy in dynamic scenes by relaxing these constraints, however, DNNs produce uninterpretable solutions which can be problematic from a safety perspective. In this paper, we propose a method of down-sampling LIDAR point clouds to exclude voxels that violate the assumption of a static scene and introduce error to the D2D scan matching process. Our approach uses a solution consistency filter -- identifying and suppressing voxels where D2D contributions disagree with local estimates from a PointNet-based registration network. Our results show that this technique provides significant benefits in registration accuracy, and is particularly useful in scenes containing dense foliage.

IRFeb 23
A Systematic Study of Biomedical Retrieval Pipeline Trade-offs in Performance and Efficiency

Hayk Stepanyan, Matthew McDermott

Retrieval systems are increasingly used in biomedical and clinical natural language processing applications, yet practical guidance for researchers building such systems is limited. In this work, we provide such guidance through an empirical study of how retrieval pipeline design choices affect performance and efficiency at scale. In particular, we examine retrieval over a variety of existing, public biomedical text datasets, leveraging a variety of disparate types of queries, including exam-style questions, conversational medical queries, community-asked questions, and non-question formulations across various retrieval pipeline settings spanning corpus selection, chunk granularity, and vector index configuration. Retrieval results are judged using a robust, win-rate comparison assessment via an LLM-as-a-judge setting with human validation. Across these experiments, we identify several points of concrete guidance for reviewers, including the superiority of corpus aggregation for absolute retrieval quality, and the emergence of MedRAG/pubmed as the Pareto-optimal singleton corpus under graph-based (HNSW) indexing, appropriate chunking strategies, and FAISS indexing choices that offer the best trade-offs in speed and efficiency.

CVNov 4, 2024Code
A Probabilistic Formulation of LiDAR Mapping with Neural Radiance Fields

Matthew McDermott, Jason Rife

In this paper we reexamine the process through which a Neural Radiance Field (NeRF) can be trained to produce novel LiDAR views of a scene. Unlike image applications where camera pixels integrate light over time, LiDAR pulses arrive at specific times. As such, multiple LiDAR returns are possible for any given detector and the classification of these returns is inherently probabilistic. Applying a traditional NeRF training routine can result in the network learning phantom surfaces in free space between conflicting range measurements, similar to how floater aberrations may be produced by an image model. We show that by formulating loss as an integral of probability (rather than as an integral of optical density) the network can learn multiple peaks for a given ray, allowing the sampling of first, nth, or strongest returns from a single output channel. Code is available at https://github.com/mcdermatt/PLINK

CVFeb 14, 2020Code
CheXclusion: Fairness gaps in deep chest X-ray classifiers

Laleh Seyyed-Kalantari, Guanxiong Liu, Matthew McDermott et al.

Machine learning systems have received much attention recently for their ability to achieve expert-level performance on clinical tasks, particularly in medical imaging. Here, we examine the extent to which state-of-the-art deep learning classifiers trained to yield diagnostic labels from X-ray images are biased with respect to protected attributes. We train convolution neural networks to predict 14 diagnostic labels in 3 prominent public chest X-ray datasets: MIMIC-CXR, Chest-Xray8, CheXpert, as well as a multi-site aggregation of all those datasets. We evaluate the TPR disparity -- the difference in true positive rates (TPR) -- among different protected attributes such as patient sex, age, race, and insurance type as a proxy for socioeconomic status. We demonstrate that TPR disparities exist in the state-of-the-art classifiers in all datasets, for all clinical tasks, and all subgroups. A multi-source dataset corresponds to the smallest disparities, suggesting one way to reduce bias. We find that TPR disparities are not significantly correlated with a subgroup's proportional disease burden. As clinical models move from papers to products, we encourage clinical decision makers to carefully audit for algorithmic disparities prior to deployment. Our code can be found at, https://github.com/LalehSeyyed/CheXclusion

LGDec 16, 2023
Event-Based Contrastive Learning for Medical Time Series

Hyewon Jeong, Nassim Oufattole, Matthew Mcdermott et al.

In clinical practice, one often needs to identify whether a patient is at high risk of adverse outcomes after some key medical event. For example, quantifying the risk of adverse outcomes after an acute cardiovascular event helps healthcare providers identify those patients at the highest risk of poor outcomes; i.e., patients who benefit from invasive therapies that can lower their risk. Assessing the risk of adverse outcomes, however, is challenging due to the complexity, variability, and heterogeneity of longitudinal medical data, especially for individuals suffering from chronic diseases like heart failure. In this paper, we introduce Event-Based Contrastive Learning (EBCL) - a method for learning embeddings of heterogeneous patient data that preserves temporal information before and after key index events. We demonstrate that EBCL can be used to construct models that yield improved performance on important downstream tasks relative to other pretraining methods. We develop and test the method using a cohort of heart failure patients obtained from a large hospital network and the publicly available MIMIC-IV dataset consisting of patients in an intensive care unit at a large tertiary care center. On both cohorts, EBCL pretraining yields models that are performant with respect to a number of downstream tasks, including mortality, hospital readmission, and length of stay. In addition, unsupervised EBCL embeddings effectively cluster heart failure patients into subgroups with distinct outcomes, thereby providing information that helps identify new heart failure phenotypes. The contrastive framework around the index event can be adapted to a wide array of time-series datasets and provides information that can be used to guide personalized care.

CLJun 12, 2025
BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP

Thomas Sounack, Joshua Davis, Brigitte Durieux et al.

Encoder-based transformer models are central to biomedical and clinical Natural Language Processing (NLP), as their bidirectional self-attention makes them well-suited for efficiently extracting structured information from unstructured text through discriminative tasks. However, encoders have seen slower development compared to decoder models, leading to limited domain adaptation in biomedical and clinical settings. We introduce BioClinical ModernBERT, a domain-adapted encoder that builds on the recent ModernBERT release, incorporating long-context processing and substantial improvements in speed and performance for biomedical and clinical NLP. BioClinical ModernBERT is developed through continued pretraining on the largest biomedical and clinical corpus to date, with over 53.5 billion tokens, and addresses a key limitation of prior clinical encoders by leveraging 20 datasets from diverse institutions, domains, and geographic regions, rather than relying on data from a single source. It outperforms existing biomedical and clinical encoders on four downstream tasks spanning a broad range of use cases. We release both base (150M parameters) and large (396M parameters) versions of BioClinical ModernBERT, along with training checkpoints to support further research.

LGMar 3, 2024
Recent Advances, Applications, and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2023 Symposium

Hyewon Jeong, Sarah Jabbour, Yuzhe Yang et al. · uw

The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four virtual roundtables at ML4H 2022. The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables. Each roundtable session included invited senior chairs (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with interest in the session's topic. Herein we detail the organization process and compile takeaways from these roundtable discussions, including recent advances, applications, and open challenges for each topic. We conclude with a summary and lessons learned across all roundtables. This document serves as a comprehensive review paper, summarizing the recent advancements in machine learning for healthcare as contributed by foremost researchers in the field.

AIMar 9
EveryQuery: Zero-Shot Clinical Prediction via Task-Conditioned Pretraining over Electronic Health Records

Payal Chandak, Gregory Kondas, Isaac Kohane et al.

Foundation models pretrained on electronic health records (EHR) have demonstrated zero-shot clinical prediction capabilities by generating synthetic patient futures and aggregating statistics over sampled trajectories. However, this autoregressive inference procedure is computationally expensive, statistically noisy, and not natively promptable because users cannot directly condition predictions on specific clinical questions. In this preliminary work, we introduce EveryQuery, an EHR foundation model that achieves zero-shot inference through task-conditioned pre-training. Rather than generating future events, EveryQuery takes as input a patient's history and a structured query specifying a clinical task, and directly estimates the likelihood of the outcome occurring in the future window via a single forward pass. EveryQuery realizes this capability by pre-training over randomly sampled combinations of query tasks and patient contexts, directly training the model to produce correct answers to arbitrary input prompts. This enables zero-shot prediction for any task in the query space without finetuning, linear probing, or trajectory generation. On MIMIC-IV, EveryQuery outperforms an autoregressive baseline on 82% of 39 randomly sampled prediction tasks, with a mean AUC improvement of +0.16 (95% CI: [0.10,0.22]). This advantage remains consistent on tasks that were explicitly held out from the pre-training distribution. Further, EveryQuery's performance gains are most pronounced for rare clinical events, affirming and demonstrating a solution to the fundamental limitation of autoregressive inference for low-prevalence outcomes. However, at present, EveryQuery underperforms on tasks requiring disjunctive reasoning over multiple codes, such as 30-day readmission, exposing a concrete expressiveness limitation of the current query language.

LGNov 2, 2021
Meta-Learning to Improve Pre-Training

Aniruddh Raghu, Jonathan Lorraine, Simon Kornblith et al.

Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks, and has led to significant performance improvements in many domains. PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models, all of which can significantly impact the quality of representations learned. The hyperparameters introduced by these strategies therefore must be tuned appropriately. However, setting the values of these hyperparameters is challenging. Most existing methods either struggle to scale to high dimensions, are too slow and memory-intensive, or cannot be directly applied to the two-stage PT and FT learning process. In this work, we propose an efficient, gradient-based algorithm to meta-learn PT hyperparameters. We formalize the PT hyperparameter optimization problem and propose a novel method to obtain PT hyperparameter gradients by combining implicit differentiation and backpropagation through unrolled optimization. We demonstrate that our method improves predictive performance on two real-world domains. First, we optimize high-dimensional task weighting hyperparameters for multitask pre-training on protein-protein interaction graphs and improve AUROC by up to 3.9%. Second, we optimize a data augmentation neural network for self-supervised PT with SimCLR on electrocardiography data and improve AUROC by up to 1.9%.

CLMar 11, 2020
Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings

Haoran Zhang, Amy X. Lu, Mohamed Abdalla et al.

In this work, we examine the extent to which embeddings may encode marginalized populations differently, and how this may lead to a perpetuation of biases and worsened performance on clinical tasks. We pretrain deep embedding models (BERT) on medical notes from the MIMIC-III hospital dataset, and quantify potential disparities using two approaches. First, we identify dangerous latent relationships that are captured by the contextual word embeddings using a fill-in-the-blank method with text from real clinical notes and a log probability bias score quantification. Second, we evaluate performance gaps across different definitions of fairness on over 50 downstream clinical prediction tasks that include detection of acute and chronic conditions. We find that classifiers trained from BERT representations exhibit statistically significant differences in performance, often favoring the majority group with regards to gender, language, ethnicity, and insurance status. Finally, we explore shortcomings of using adversarial debiasing to obfuscate subgroup information in contextual word embeddings, and recommend best practices for such deep embedding models in clinical settings.

CVApr 4, 2019
Clinically Accurate Chest X-Ray Report Generation

Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott et al.

The automatic generation of radiology reports given medical radiographs has significant potential to operationally and improve clinical patient care. A number of prior works have focused on this problem, employing advanced methods from computer vision and natural language generation to produce readable reports. However, these works often fail to account for the particular nuances of the radiology domain, and, in particular, the critical importance of clinical accuracy in the resulting generated reports. In this work, we present a domain-aware automatic chest X-ray radiology report generation system which first predicts what topics will be discussed in the report, then conditionally generates sentences corresponding to these topics. The resulting system is fine-tuned using reinforcement learning, considering both readability and clinical accuracy, as assessed by the proposed Clinically Coherent Reward. We verify this system on two datasets, Open-I and MIMIC-CXR, and demonstrate that our model offers marked improvements on both language generation metrics and CheXpert assessed accuracy over a variety of competitive baselines.

LGNov 21, 2018
Unsupervised Multimodal Representation Learning across Medical Images and Reports

Tzu-Ming Harry Hsu, Wei-Hung Weng, Willie Boag et al.

Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the broader goals of multimodal representation learning. In this work, we establish baseline joint embedding results measured via both local and global retrieval methods on the soon to be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports. We examine both supervised and unsupervised methods on this task and show that for document retrieval tasks with the learned representations, only a limited amount of supervision is needed to yield results comparable to those of fully-supervised methods.

LGNov 17, 2018
Machine Learning for Health (ML4H) Workshop at NeurIPS 2018

Natalia Antropova, Andrew L. Beam, Brett K. Beaulieu-Jones et al.

This volume represents the accepted submissions from the Machine Learning for Health (ML4H) workshop at the conference on Neural Information Processing Systems (NeurIPS) 2018, held on December 8, 2018 in Montreal, Canada.