STDSITLGJan 12, 2022

On the Statistical Complexity of Sample Amplification

arXiv:2201.04315v24 citations
AI Analysis

This work addresses a foundational statistical problem with potential applications in data augmentation and privacy, but it appears incremental as it builds on existing notions without claiming broad SOTA breakthroughs.

The paper tackles the sample amplification problem, which asks when it is possible to generate additional indistinguishable samples from an unknown distribution given a smaller set, and provides generally applicable procedures and lower bounds, establishing a rigorous connection to distribution learning for a large class of distributions like the exponential family.

The ``sample amplification'' problem formalizes the following question: Given $n$ i.i.d. samples drawn from an unknown distribution $P$, when is it possible to produce a larger set of $n+m$ samples which cannot be distinguished from $n+m$ i.i.d. samples drawn from $P$? In this work, we provide a firm statistical foundation for this problem by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions. Our techniques apply to a large class of distributions including the exponential family, and establish a rigorous connection between sample amplification and distribution learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes