LG MLNov 20, 2016

Dealing with Range Anxiety in Mean Estimation via Statistical Queries

arXiv:1611.06475v24.910 citations

Originality Incremental advance

AI Analysis

This work addresses range anxiety in mean estimation for machine learning and statistics, providing more efficient algorithms in data-limited settings, though it is incremental as it builds on existing models with specific improvements.

The paper tackles the problem of estimating the expectation of a real-valued function from an unknown distribution in restricted data access models, such as statistical queries and single-bit communication, where naive methods scale poorly with the range. The result is a simple algorithm that achieves error scaling linearly with the standard deviation and logarithmically with an upper bound on the second moment, improving over previous approaches.

We give algorithms for estimating the expectation of a given real-valued function $φ:X\to {\bf R}$ on a sample drawn randomly from some unknown distribution $D$ over domain $X$, namely ${\bf E}_{{\bf x}\sim D}[φ({\bf x})]$. Our algorithms work in two well-studied models of restricted access to data samples. The first one is the statistical query (SQ) model in which an algorithm has access to an SQ oracle for the input distribution $D$ over $X$ instead of i.i.d. samples from $D$. Given a query function $φ:X \to [0,1]$, the oracle returns an estimate of ${\bf E}_{{\bf x}\sim D}[φ({\bf x})]$ within some tolerance $τ$. The second, is a model in which only a single bit is communicated from each sample. In both of these models the error obtained using a naive implementation would scale polynomially with the range of the random variable $φ({\bf x})$ (which might even be infinite). In contrast, without restrictions on access to data the expected error scales with the standard deviation of $φ({\bf x})$. Here we give a simple algorithm whose error scales linearly in standard deviation of $φ({\bf x})$ and logarithmically with an upper bound on the second moment of $φ({\bf x})$. As corollaries, we obtain algorithms for high dimensional mean estimation and stochastic convex optimization in these models that work in more general settings than previously known solutions.

View on arXiv PDF

Similar