Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology
This addresses annotation quality issues for researchers and practitioners in machine learning, but it is incremental as it builds on existing prescriptive methods.
The paper tackles the problem of information loss and guideline adherence in data annotation by introducing GCAM, a methodology that reports guidelines per sample, and shows it improves annotation quality and enables efficient data reuse across tasks.
We introduce the Guideline-Centered Annotation Methodology (GCAM), a novel data annotation methodology designed to report the annotation guidelines associated with each data sample. Our approach addresses three key limitations of the standard prescriptive annotation methodology by reducing the information loss during annotation and ensuring adherence to guidelines. Furthermore, GCAM enables the efficient reuse of annotated data across multiple tasks. We evaluate GCAM in two ways: (i) through a human annotation study and (ii) an experimental evaluation with several machine learning models. Our results highlight the advantages of GCAM from multiple perspectives, demonstrating its potential to improve annotation quality and error analysis.