Sparse Probability of Agreement
This addresses a practical limitation in annotation tasks for researchers and practitioners, but it is incremental as it builds on existing agreement metrics.
The paper tackles the problem of measuring inter-annotator agreement when not all annotators label all samples, by introducing Sparse Probability of Agreement (SPA) as an unbiased estimator under certain conditions.
Measuring inter-annotator agreement is important for annotation tasks, but many metrics require a fully-annotated set of data, where all annotators annotate all samples. We define Sparse Probability of Agreement, SPA, which estimates the probability of agreement when not all annotator-item-pairs are available. We show that under certain conditions, SPA is an unbiased estimator, and we provide multiple weighing schemes for handling data with various degrees of annotation.