HC LG MLNov 7, 2021

Open-Set Crowdsourcing using Multiple-Source Transfer Learning

Guangyang Han, Guoxian Yu, Lei Liu, Lizhen Cui, Carlotta Domeniconi, Xiangliang Zhang

arXiv:2111.04073v13.7

Originality Incremental advance

AI Analysis

This addresses a novel scenario in crowdsourcing for data annotation tasks where unfamiliarity with labels hampers modeling, though it is incremental in applying transfer learning to this specific domain.

The paper tackles the problem of open-set crowdsourcing, where the label space is unknown, by proposing OSCrowd, which uses multiple-source transfer learning to infer the label space and guide annotations, achieving better performance than existing solutions in online validation.

We raise and define a new crowdsourcing scenario, open set crowdsourcing, where we only know the general theme of an unfamiliar crowdsourcing project, and we don't know its label space, that is, the set of possible labels. This is still a task annotating problem, but the unfamiliarity with the tasks and the label space hampers the modelling of the task and of workers, and also the truth inference. We propose an intuitive solution, OSCrowd. First, OSCrowd integrates crowd theme related datasets into a large source domain to facilitate partial transfer learning to approximate the label space inference of these tasks. Next, it assigns weights to each source domain based on category correlation. After this, it uses multiple-source open set transfer learning to model crowd tasks and assign possible annotations. The label space and annotations given by transfer learning will be used to guide and standardize crowd workers' annotations. We validate OSCrowd in an online scenario, and prove that OSCrowd solves the open set crowdsourcing problem, works better than related crowdsourcing solutions.

View on arXiv PDF

Similar