GN AI APDec 22, 2023

Improving Task Instructions for Data Annotators: How Clear Rules and Higher Pay Increase Performance in Data Annotation in the AI Economy

Johann Laux, Fabian Stephany, Alice Liefgreen

arXiv:2312.14565v21 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses the challenge of enhancing data quality and worker wellbeing in AI development for policymakers and practitioners, though it is incremental as it builds on existing theories in law and economics.

The paper tackled the problem of improving data annotation quality by experimentally testing the effects of clear rules versus vague standards and monetary incentives on annotator performance, finding that clear rules increased accuracy by 14% and combined with incentives achieved 87.5% accuracy.

The global surge in AI applications is transforming industries, leading to displacement and complementation of existing jobs, while also giving rise to new employment opportunities. Data annotation, encompassing the labelling of images or annotating of texts by human workers, crucially influences the quality of a dataset directly influences the quality of AI models trained on it. This paper delves into the economics of data annotation, with a specific focus on the impact of task instruction design (that is, the choice between rules and standards as theorised in law and economics) and monetary incentives on data quality and costs. An experimental study involving 307 data annotators examines six groups with varying task instructions (norms) and monetary incentives. Results reveal that annotators provided with clear rules exhibit higher accuracy rates, outperforming those with vague standards by 14%. Similarly, annotators receiving an additional monetary incentive perform significantly better, with the highest accuracy rate recorded in the group working with both clear rules and incentives (87.5% accuracy). In addition, our results show that rules are perceived as being more helpful by annotators than standards and reduce annotators' difficulty in annotating images. These empirical findings underscore the double benefit of rule-based instructions on both data quality and worker wellbeing. Our research design allows us to reveal that, in our study, rules are more cost-efficient in increasing accuracy than monetary incentives. The paper contributes experimental insights to discussions on the economical, ethical, and legal considerations of AI technologies. Addressing policymakers and practitioners, we emphasise the need for a balanced approach in optimising data annotation processes for efficient and ethical AI development and usage.

View on arXiv PDF

Similar