High-Dimensional Independence Testing via Maximum and Average Distance Correlations
This work addresses the need for efficient non-parametric independence tests in high-dimensional data, with applications in fields like bioinformatics, but it is incremental as it builds on existing distance correlation methods.
The paper tackles the problem of multivariate independence testing in high-dimensional settings by proposing maximum and average distance correlations, characterizing their consistency, and developing a fast chi-square-based testing procedure. The result includes empirical evaluations showing performance across dependence scenarios and a real data application to cancer types and peptide levels.
This paper investigates the utilization of maximum and average distance correlations for multivariate independence testing. We characterize their consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions, compare the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure. The resulting tests are non-parametric and applicable to both Euclidean distance and the Gaussian kernel as the underlying metric. To better understand the practical use cases of the proposed tests, we evaluate the empirical performance of the maximum distance correlation, average distance correlation, and the original distance correlation across various multivariate dependence scenarios, as well as conduct a real data experiment to test the presence of various cancer types and peptide levels in human plasma.