LG MLNov 19, 2019

Optimal Robust Learning of Discrete Distributions from Batches

arXiv:1911.08532v210.316 citations

Originality Highly original

AI Analysis

This addresses a critical issue for applications like natural language processing and federated learning where data reliability is a concern, offering a significant improvement over previous exponential-time methods.

The paper tackles the problem of estimating discrete distributions from batches of data that may include untrustworthy or adversarial batches, providing the first polynomial-time estimator that is optimal in the number of batches and achieves near-optimal estimation accuracy.

Many applications, including natural language processing, sensor networks, collaborative filtering, and federated learning, call for estimating discrete distributions from data collected in batches, some of which may be untrustworthy, erroneous, faulty, or even adversarial. Previous estimators for this setting ran in exponential time, and for some regimes required a suboptimal number of batches. We provide the first polynomial-time estimator that is optimal in the number of batches and achieves essentially the best possible estimation accuracy.

View on arXiv PDF

Similar