LGJan 21, 2022

Adaptive Data Analysis with Correlated Observations

arXiv:2201.08704v113 citations
AI Analysis

This addresses the challenge of adaptive data analysis for datasets with dependencies, which is less understood than the independent case, offering theoretical insights for machine learning and statistics.

The paper tackles the problem of adaptive data analysis with correlated observations, showing that differential privacy can guarantee generalization under certain dependencies quantified by Gibbs-dependence, and extends transcript-compression connections to non-iid settings, with a tight negative example provided.

The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes