MLLGJul 18, 2019

A discriminative approach for finding and characterizing positivity violations using decision trees

arXiv:1907.08127v15 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of validating the positivity assumption in high-dimensional datasets for researchers and practitioners in causal inference, representing an incremental improvement over existing methods.

The paper tackles the problem of detecting and characterizing positivity violations in causal inference by proposing a decision tree-based method that automatically identifies subspaces with violations and quantifies their robustness using a random forest, offering scalable and interpretable visualizations.

The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or they cannot pinpoint the subpopulation violating positivity. To overcome these issues, we suggest to harness decision trees for detecting violations. By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur. We provide a visualization of the stratification rules that define each subpopulation, combined with the severity of positivity violation within it. We also provide an interactive version of the visualization that allows a deeper dive into the properties of each subspace.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes