AP AI SIJun 11, 2021

Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability

arXiv:2106.07393v113.444 citations

Originality Incremental advance

AI Analysis

This work addresses the need for better interpretation of IRR in fields like psychology and data science, offering a practical tool for evaluating crowdsourced datasets, though it is incremental as it builds on existing reliability measures.

The authors tackled the problem of interpreting inter-rater reliability (IRR) by proposing an empirical framework based on benchmarking against baselines, including a novel cross-replication reliability (xRR) measure using Cohen's kappa, and applied it to a dataset of 4 million human judgments of facial expressions to assess crowdsourced dataset quality.

We present a new approach to interpreting IRR that is empirical and contextualized. It is based upon benchmarking IRR against baseline measures in a replication, one of which is a novel cross-replication reliability (xRR) measure based on Cohen's kappa. We call this approach the xRR framework. We opensource a replication dataset of 4 million human judgements of facial expressions and analyze it with the proposed framework. We argue this framework can be used to measure the quality of crowdsourced datasets.

View on arXiv PDF

Similar