Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means
This work addresses robust mean estimation for heavy-tailed data, which is crucial for machine learning applications like clustering and regression, but it appears incremental as it builds on existing Median-of-Means methods.
The paper tackles the problem of simultaneously estimating the means of functions in a class for heavy-tailed distributions with limited moments, achieving a new sample complexity bound. It applies this to improve results in k-means clustering with unbounded inputs and linear regression with general losses.
The Median of Means (MoM) is a mean estimator that has gained popularity in the context of heavy-tailed data. In this work, we analyze its performance in the task of simultaneously estimating the mean of each function in a class $\mathcal{F}$ when the data distribution possesses only the first $p$ moments for $p \in (1,2]$. We prove a new sample complexity bound using a novel symmetrization technique that may be of independent interest. Additionally, we present applications of our result to $k$-means clustering with unbounded inputs and linear regression with general losses, improving upon existing works.