HC SESep 17, 2021

Learning from Crowds with Crowd-Kit

Dmitry Ustalov, Nikita Pavlichenko, Boris Tseitlin

arXiv:2109.08584v422.025 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This toolkit addresses the need for efficient and reproducible quality control in crowdsourcing applications, but it is incremental as it packages existing methods into a convenient software library.

The paper tackles the problem of computational quality control in crowdsourcing by presenting Crowd-Kit, a Python toolkit that implements popular algorithms for truth inference, deep learning from crowds, and data quality estimation, and it was evaluated on multiple datasets to enable systematic benchmarking.

This paper presents Crowd-Kit, a general-purpose computational quality control toolkit for crowdsourcing. Crowd-Kit provides efficient and convenient implementations of popular quality control algorithms in Python, including methods for truth inference, deep learning from crowds, and data quality estimation. Our toolkit supports multiple modalities of answers and provides dataset loaders and example notebooks for faster prototyping. We extensively evaluated our toolkit on several datasets of different natures, enabling benchmarking computational quality control methods in a uniform, systematic, and reproducible way using the same codebase. We release our code and data under the Apache License 2.0 at https://github.com/Toloka/crowd-kit.

View on arXiv PDF Code

Similar