Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis, Felix Zhou, Ziyu Zhu

arXiv:2602.23341v11.4h-index: 19

Originality Highly original

AI Analysis

This work is significant for researchers and practitioners dealing with mean estimation from incomplete or rounded data, providing theoretical characterizations and efficient algorithms for a previously challenging problem.

This paper addresses Gaussian mean estimation from coarse data, where only the set containing a sample is observed. It resolves two open questions regarding identifiability and computationally efficient estimation under convex partitions, showing that efficient estimation is possible when the mean is identifiable.

Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$. When the coarse samples, roughly speaking, have ``low'' information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not identifiable). Recent work by Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21] established that sample-efficient mean estimation is possible when the unknown mean is identifiable and the partition consists of only convex sets. Moreover, they showed that without convexity, mean estimation becomes NP-hard. However, two fundamental questions remained open: (1) When is the mean identifiable under convex partitions? (2) Is computationally efficient estimation possible under identifiability and convex partitions? This work resolves both questions. [...]

View on arXiv PDF

Similar