Efficient Crowdsourcing via Proxy Voting
This addresses budget efficiency in crowdsourcing for data labeling, though it appears incremental as it builds on existing aggregation methods.
The paper tackles the problem of labeling data efficiently on crowdsourcing platforms by introducing Proxy Crowdsourcing (PCS), which uses leaders and followers to reduce costs while maintaining accuracy. It shows that PCS improves accuracy over unweighted aggregation across multiple datasets with fixed budgets.
Crowdsourcing platforms offer a way to label data by aggregating answers of multiple unqualified workers. We introduce a \textit{simple} and \textit{budget efficient} crowdsourcing method named Proxy Crowdsourcing (PCS). PCS collects answers from two sets of workers: \textit{leaders} (a.k.a proxies) and \textit{followers}. Each leader completely answers the survey while each follower answers only a small subset of it. We then weigh every leader according to the number of followers to which his answer are closest, and aggregate the answers of the leaders using any standard aggregation method (e.g., Plurality for categorical labels or Mean for continuous labels). We compare empirically the performance of PCS to unweighted aggregation, keeping the total number of questions (the budget) fixed. We show that PCS improves the accuracy of aggregated answers across several datasets, both with categorical and continuous labels. Overall, our suggested method improves accuracy while being simple and easy to implement.