ME LGMay 15, 2022

Evaluating Independence and Conditional Independence Measures

arXiv:2205.07253v13 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This work provides a comparative analysis for researchers and practitioners in statistics and machine learning, though it is incremental as it reviews and tests existing measures rather than introducing new ones.

The paper evaluated 16 independence and 16 conditional independence measures using simulated and real data, finding that most perform well on simulated data but only a few work effectively on complex real-world datasets, with CE recommended as a robust choice for both tasks.

Independence and Conditional Independence (CI) are two fundamental concepts in probability and statistics, which can be applied to solve many central problems of statistical inference. There are many existing independence and CI measures defined from diverse principles and concepts. In this paper, the 16 independence measures and 16 CI measures were reviewed and then evaluated with simulated and real data. For the independence measures, eight simulated data were generating from normal distribution, normal and Archimedean copula functions to compare the measures in bivariate or multivariate, linear or nonlinear settings. Two UCI dataset, including the heart disease data and the wine quality data, were used to test the power of the independence measures in real conditions. For the CI measures, two simulated data with normal distribution and Gumbel copula, and one real data (the Beijing air data) were utilized to test the CI measures in prespecified linear or nonlinear setting and real scenario. From the experimental results, we found that most of the measures work well on the simulated data by presenting the right monotonicity of the simulations. However, the independence and CI measures were differentiated on much complex real data respectively and only a few can be considered as working well with reference to domain knowledge. We also found that the measures tend to be separated into groups based on the similarity of the behaviors of them in each setting and in general. According to the experiments, we recommend CE as a good choice for both independence and CI measure. This is also due to its rigorous distribution-free definition and consistent nonparametric estimator.

View on arXiv PDF

Similar