Optimum Statistical Estimation with Strategic Data Sources
This work addresses the challenge of ensuring reliable data from strategic sources in machine learning, offering a novel mechanism design framework for cost-effective estimation.
The authors tackled the problem of incentivizing high-quality data provision for statistical estimators by proposing an optimum mechanism that minimizes the sum of payments and estimation error, applicable to various regression methods and objectives.
We propose an optimum mechanism for providing monetary incentives to the data sources of a statistical estimator such as linear regression, so that high quality data is provided at low cost, in the sense that the sum of payments and estimation error is minimized. The mechanism applies to a broad range of estimators, including linear and polynomial regression, kernel regression, and, under some additional assumptions, ridge regression. It also generalizes to several objectives, including minimizing estimation error subject to budget constraints. Besides our concrete results for regression problems, we contribute a mechanism design framework through which to design and analyze statistical estimators whose examples are supplied by workers with cost for labeling said examples.