Non-Parametric Inference of Relational Dependence
This addresses the challenge of statistical inference in relational systems like networks, where standard i.i.d. assumptions fail, but it is incremental as it builds on existing kernel-based tests.
The paper tackles the problem of independence testing in non-i.i.d. relational data by defining marginal and conditional tests using kernel mean embeddings, and demonstrates its effectiveness compared to state-of-the-art methods in synthetic and semi-synthetic networks.
Independence testing plays a central role in statistical and causal inference from observational data. Standard independence tests assume that the data samples are independent and identically distributed (i.i.d.) but that assumption is violated in many real-world datasets and applications centered on relational systems. This work examines the problem of estimating independence in data drawn from relational systems by defining sufficient representations for the sets of observations influencing individual instances. Specifically, we define marginal and conditional independence tests for relational data by considering the kernel mean embedding as a flexible aggregation function for relational variables. We propose a consistent, non-parametric, scalable kernel test to operationalize the relational independence test for non-i.i.d. observational data under a set of structural assumptions. We empirically evaluate our proposed method on a variety of synthetic and semi-synthetic networks and demonstrate its effectiveness compared to state-of-the-art kernel-based independence tests.