LGJun 12, 2024

Sources of Gain: Decomposing Performance in Conditional Average Dose Response Estimation

arXiv:2406.08206v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of misleading benchmark evaluations in machine learning for CADR estimation, which is incremental as it critiques and refines existing practices rather than introducing a new estimator.

The paper analyzed the practice of evaluating conditional average dose response (CADR) estimators on popular benchmark datasets and found it insufficient, revealing that key challenges like confounding are not present in these datasets. They proposed a decomposition scheme to evaluate five distinct performance components, applied it to eight estimators across four benchmarks, and conducted nearly 1,500 experiments.

Estimating conditional average dose responses (CADR) is an important but challenging problem. Estimators must correctly model the potentially complex relationships between covariates, interventions, doses, and outcomes. In recent years, the machine learning community has shown great interest in developing tailored CADR estimators that target specific challenges. Their performance is typically evaluated against other methods on (semi-) synthetic benchmark datasets. Our paper analyses this practice and shows that using popular benchmark datasets without further analysis is insufficient to judge model performance. Established benchmarks entail multiple challenges, whose impacts must be disentangled. Therefore, we propose a novel decomposition scheme that allows the evaluation of the impact of five distinct components contributing to CADR estimator performance. We apply this scheme to eight popular CADR estimators on four widely-used benchmark datasets, running nearly 1,500 individual experiments. Our results reveal that most established benchmarks are challenging for reasons different from their creators' claims. Notably, confounding, the key challenge tackled by most estimators, is not an issue in any of the considered datasets. We discuss the major implications of our findings and present directions for future research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes