Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects
This work addresses the curse of dimensionality in causal effect estimation for researchers and practitioners, offering an incremental improvement with a preprocessing step for existing learners.
The paper tackles the problem of estimating heterogeneous causal effects (CATEs) by proposing an energy-based model that learns low-dimensional representations to reduce sample complexity, achieving better performance than using raw variables or other dimensionality reduction methods in experiments.
Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. However, typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. This requirement can be satisfied by collecting many variables, at the expense of increased sample complexity for estimating CATEs. To combat this, we propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function. With our EBM we introduce a preprocessing step that alleviates the dimensionality curse for any existing learner developed for estimating CATEs. We prove that our EBM keeps the representations partially identifiable up to some universal constant, as well as having universal approximation capability. These properties enable the representations to converge and keep the CATE estimates consistent. Experiments demonstrate the convergence of the representations, as well as show that estimating CATEs on our representations performs better than on the variables or the representations obtained through other dimensionality reduction methods.