Orthogonal machine learning for conditional odds and risk ratios

arXiv:2604.104129.5h-index: 2

AI Analysis

For researchers in precision health and causal inference, this provides principled estimators for conditional odds and risk ratios, enabling better targeting of treatments to subgroups that benefit most.

This work generalizes doubly robust and orthogonal risk estimators (DR-learner, R-learner) to conditional odds ratios and risk ratios, showing second-order conditional-mean remainder properties. In simulations across hundreds of data-generating distributions, the proposed nonparametric estimators reduce bias and mean squared error compared to parametric alternatives, especially in complex settings.

Conditional effects are commonly used measures for understanding how treatment effects vary across different groups, and are often used to target treatments/interventions to groups who benefit most. In this work we review existing methods and propose novel ones, focusing on the odds ratio (OR) and the risk ratio (RR). While estimation of the conditional average treatment effect (ATE) has been widely studied, estimators for the OR and RR lag behind, and cutting edge estimators such as those based on doubly robust transformations or orthogonal risk functions have not been generalized to these parameters. We propose such a generalization here, focusing on the DR-learner and the R-learner. We derive orthogonal risk functions for the OR and RR and show that the associated pseudo-outcomes satisfy second-order conditional-mean remainder properties analogous to the ATE case. We also evaluate estimators for the conditional ATE, OR, and RR in a comprehensive nonparametric Monte Carlo simulation study to compare them with common alternatives under hundreds of different data-generating distributions. Our numerical studies provide empirical guidance for choosing an estimator. For instance, they show that while parametric models are useful in very simple settings, the proposed nonparametric estimators significantly reduce bias and mean squared error in the more complex settings expected in the real world. We illustrate the methods in the analysis of physical activity and sleep trouble in U.S. adults using data from the National Health and Nutrition Examination Survey (NHANES). The results demonstrate that our estimators uncover substantial treatment effect heterogeneity that is obscured by traditional regression approaches and lead to improved treatment decision rules, highlighting the importance of data-adaptive methods for advancing precision health research.

View on arXiv PDF

Similar