Incentivizing Truthful Data Contributions in a Marketplace for Mean Estimation
This addresses the challenge of ensuring truthful data reporting in marketplaces for statistical estimation, which is incremental to existing mechanism design literature.
The paper tackles the problem of designing a data marketplace for mean estimation, where a broker must incentivize truthful data contributions while maximizing welfare or profit, and it shows that the optimal mechanism leads to a Nash equilibrium where the two lowest-cost contributors collect all data, with hardness results proving no better mechanism exists.
We study a data marketplace where a broker intermediates between buyers, who seek to estimate the mean \(μ\) of an unknown normal distribution \(\Ncal(μ, Ï^2)\), and contributors, who can collect data from this distribution at a cost. The broker delegates data collection work to contributors, aggregates reported datasets, sells it to buyers, and redistributes revenue as payments to contributors. We aim to maximize welfare or profit under key constraints: individual rationality for buyers and contributors, incentive compatibility (contributors are incentivized to comply with data collection instructions and truthfully report the collected data), and budget balance (total contributor payments equals total revenue). We first compute welfare/profit-optimal prices under truthful reporting; however, to incentivize data collection and truthful data reporting, we adjust them based on discrepancies in contributors' reported data. This yields a Nash equilibrium (NE) where the two lowest-cost contributors collect all data. We complement this with two hardness results: \emph{(i)} no nontrivial dominant-strategy incentive-compatible mechanism exists in this problem, and \emph{(ii)} no mechanism outperforms ours in a NE.