Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation
This work addresses causal inference for categorical variables, a domain-specific challenge in statistics and machine learning, with incremental improvements over existing methods.
The authors tackled the problem of causal discovery for categorical data, which is less studied than for quantitative data, by proposing a novel causal model based on a parsimonious classifier called classification with optimal label permutation (COLP), and demonstrated favorable performance compared to state-of-the-art methods in experiments with synthetic and real data.
Causal discovery for quantitative data has been extensively studied but less is known for categorical data. We propose a novel causal model for categorical data based on a new classification model, termed classification with optimal label permutation (COLP). By design, COLP is a parsimonious classifier, which gives rise to a provably identifiable causal model. A simple learning algorithm via comparing likelihood functions of causal and anti-causal models suffices to learn the causal direction. Through experiments with synthetic and real data, we demonstrate the favorable performance of the proposed COLP-based causal model compared to state-of-the-art methods. We also make available an accompanying R package COLP, which contains the proposed causal discovery algorithm and a benchmark dataset of categorical cause-effect pairs.