Data Poisoning Attacks and Defenses to Crowdsourcing Systems
This work addresses security vulnerabilities in crowdsourcing systems, which are critical for big data analytics, by introducing attacks and defenses, though it is incremental in applying known poisoning concepts to this domain.
The authors demonstrated that crowdsourcing systems are vulnerable to data poisoning attacks, where malicious clients corrupt aggregated data, increasing estimation errors by substantial amounts as shown on synthetic and real-world datasets. They also proposed two defenses that effectively reduce these errors.
A key challenge of big data analytics is how to collect a large volume of (labeled) data. Crowdsourcing aims to address this challenge via aggregating and estimating high-quality data (e.g., sentiment label for text) from pervasive clients/users. Existing studies on crowdsourcing focus on designing new methods to improve the aggregated data quality from unreliable/noisy clients. However, the security aspects of such crowdsourcing systems remain under-explored to date. We aim to bridge this gap in this work. Specifically, we show that crowdsourcing is vulnerable to data poisoning attacks, in which malicious clients provide carefully crafted data to corrupt the aggregated data. We formulate our proposed data poisoning attacks as an optimization problem that maximizes the error of the aggregated data. Our evaluation results on one synthetic and two real-world benchmark datasets demonstrate that the proposed attacks can substantially increase the estimation errors of the aggregated data. We also propose two defenses to reduce the impact of malicious clients. Our empirical results show that the proposed defenses can substantially reduce the estimation errors of the data poisoning attacks.