STLGOct 22, 2022

Testing Independence of Exchangeable Random Variables

arXiv:2210.12392v13 citationsh-index: 45
AI Analysis

This addresses the issue of non-i.i.d. data in fields like Deep Learning, where web-scraped data with duplications can lead to inaccurate test-set evaluations.

The paper tackles the problem of testing whether exchangeable random variables are independent, developing tests that can reject the null hypothesis of i.i.d. data with high power for some exchangeable distributions, without structural assumptions on the sample space.

Given well-shuffled data, can we determine whether the data items are statistically (in)dependent? Formally, we consider the problem of testing whether a set of exchangeable random variables are independent. We will show that this is possible and develop tests that can confidently reject the null hypothesis that data is independent and identically distributed and have high power for (some) exchangeable distributions. We will make no structural assumptions on the underlying sample space. One potential application is in Deep Learning, where data is often scraped from the whole internet, with duplications abound, which can render data non-iid and test-set evaluation prone to give wrong answers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes