HCOct 6, 2020

Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan, Ana Marasovic, Zhen Nie

arXiv:2010.06694v165.0995 citations

Originality Synthesis-oriented

AI Analysis

This tool addresses data collection bottlenecks for AI researchers and practitioners, offering a convenient solution but is incremental as it builds on existing annotation platform concepts.

The paper tackles the challenges of designing user-friendly annotation interfaces, training annotators efficiently, and ensuring reproducibility in large-scale data collection by introducing Crowdaq, an open-source platform that standardizes the pipeline with customizable components, automated qualification, and reusable formats, showing it simplifies annotation across diverse use cases.

High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that Crowdaq simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.

View on arXiv PDF

Similar