MELGMLApr 21, 2025

Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners

arXiv:2504.15386v1h-index: 17Has CodeJournal of Causal Inference
Originality Incremental advance
AI Analysis

This work addresses a gap in public health and social science research by enabling the analysis of surrogate markers in non-randomized settings, though it is incremental as it builds on existing meta-learner methods.

The paper tackles the problem of assessing surrogate heterogeneity in real-world non-randomized data, where existing methods fail to account for patient characteristics, and proposes a framework using meta-learners to quantify this heterogeneity and identify valid surrogate individuals, with performance evaluated through simulation and an application to hemoglobin A1c as a surrogate for fasting plasma glucose.

Surrogate markers are most commonly studied within the context of randomized clinical trials. However, the need for alternative outcomes extends beyond these settings and may be more pronounced in real-world public health and social science research, where randomized trials are often impractical. Research on identifying surrogates in real-world non-randomized data is scarce, as available statistical approaches for evaluating surrogate markers tend to rely on the assumption that treatment is randomized. While the few methods that allow for non-randomized treatment/exposure appropriately handle confounding individual characteristics, they do not offer a way to examine surrogate heterogeneity with respect to patient characteristics. In this paper, we propose a framework to assess surrogate heterogeneity in real-world, i.e., non-randomized, data and implement this framework using various meta-learners. Our approach allows us to quantify heterogeneity in surrogate strength with respect to patient characteristics while accommodating confounders through the use of flexible, off-the-shelf machine learning methods. In addition, we use our framework to identify individuals for whom the surrogate is a valid replacement of the primary outcome. We examine the performance of our methods via a simulation study and application to examine heterogeneity in the surrogacy of hemoglobin A1c as a surrogate for fasting plasma glucose.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes