Kernel-based Approach to Handle Mixed Data for Inferring Causal Graphs
This work addresses a specific challenge in causal inference for researchers dealing with heterogeneous datasets, but it is incremental as it adapts existing methods rather than introducing a new paradigm.
The research tackled the problem of learning causal graphs from datasets with mixed data types by proposing a kernel-based approach using Kernel Alignment as a substitute for correlation matrices in existing causal algorithms like PC and FCI, successfully applying it to handle categorical, binary, ordinal, and continuous variables.
Causal learning is a beneficial approach to analyze the cause and effect relationships among variables in a dataset. A causal graph can be generated from a dataset using a particular causal algorithm, for instance, the PC algorithm or Fast Causal Inference (FCI). Generating a causal graph from a dataset that contains different data types (mixed data) is not trivial. This research offers an easy way to handle the mixed data so that it can be used to learn causal graphs using the existing application of the PC algorithm and FCI. This research proposes using kernel functions and Kernel Alignment to handle mixed data. Two main steps of this approach are computing a kernel matrix for each variable and calculating a pseudo-correlation matrix using Kernel Alignment. Kernel Alignment is used as a substitute for the correlation matrix for the conditional independence test for Gaussian data in the PC Algorithm and FCI. The advantage of this idea is that is possible to handle any data type by using a suitable kernel function to compute a kernel matrix for an observed variable. The proposed method is successfully applied to learn a causal graph from mixed data containing categorical, binary, ordinal, and continuous variables.