Data Overvaluation Attack and Truthful Data Valuation in Federated Learning
This addresses the threat of strategic manipulation in data valuation for federated learning, which is critical for incentivizing honest participation in collaborative machine learning.
The paper tackles the problem of clients exaggerating their data contributions in federated learning by introducing a data overvaluation attack and proposing Truth-Shapley, a Bayesian truthful data valuation metric that ensures clients' optimal strategy is truthful under certain conditions, with experiments showing vulnerability of existing metrics and robustness of Truth-Shapley.
In collaborative machine learning (CML), data valuation, i.e., evaluating the contribution of each client's data to the machine learning model, has become a critical task for incentivizing and selecting positive data contributions. However, existing studies often assume that clients engage in data valuation truthfully, overlooking the practical motivation for clients to exaggerate their contributions. To unlock this threat, this paper introduces the data overvaluation attack, enabling strategic clients to have their data significantly overvalued in federated learning, a widely adopted paradigm for decentralized CML. Furthermore, we propose a Bayesian truthful data valuation metric, named Truth-Shapley. Truth-Shapley is the unique metric that guarantees some promising axioms for data valuation while ensuring that clients' optimal strategy is to perform truthful data valuation under certain conditions. Our experiments demonstrate the vulnerability of existing data valuation metrics to the proposed attack and validate the robustness and effectiveness of Truth-Shapley.