ML LG STMar 29, 2019

Data Amplification: A Unified and Competitive Approach to Property Estimation

Yi Hao, Alon Orlitsky, Ananda T. Suresh, Yihong Wu

arXiv:1904.00070v111.332 citations

Originality Highly original

AI Analysis

This provides a distribution-independent, off-the-shelf solution for property estimation, offering significant data efficiency improvements over common practices, though it is incremental in building on existing estimator frameworks.

The paper tackles the problem of estimating properties of discrete distributions by designing a unified, linear-time estimator that uses only 2n samples to match the performance of the empirical estimator with n√log n samples, achieving competitive results across various properties and distributions.

Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just $2n$ samples to achieve the performance attained by the empirical estimator with $n\sqrt{\log n}$ samples. This provides off-the-shelf, distribution-independent, "amplification" of the amount of data available relative to common-practice estimators. We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with $n$ samples is even as good as that of the empirical estimator with $n\log n$ samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.

View on arXiv PDF

Similar