HCSESep 17, 2021

Learning from Crowds with Crowd-Kit

arXiv:2109.08584v425 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This toolkit addresses the need for efficient and reproducible quality control in crowdsourcing applications, but it is incremental as it packages existing methods into a convenient software library.

The paper tackles the problem of computational quality control in crowdsourcing by presenting Crowd-Kit, a Python toolkit that implements popular algorithms for truth inference, deep learning from crowds, and data quality estimation, and it was evaluated on multiple datasets to enable systematic benchmarking.

This paper presents Crowd-Kit, a general-purpose computational quality control toolkit for crowdsourcing. Crowd-Kit provides efficient and convenient implementations of popular quality control algorithms in Python, including methods for truth inference, deep learning from crowds, and data quality estimation. Our toolkit supports multiple modalities of answers and provides dataset loaders and example notebooks for faster prototyping. We extensively evaluated our toolkit on several datasets of different natures, enabling benchmarking computational quality control methods in a uniform, systematic, and reproducible way using the same codebase. We release our code and data under the Apache License 2.0 at https://github.com/Toloka/crowd-kit.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes