CLJan 27, 2022

Yes-Yes-Yes: Proactive Data Collection for ACL Rolling Review and Beyond

arXiv:2201.11443v3292 citations
Originality Incremental advance
AI Analysis

This addresses data scarcity for NLP researchers in peer review, but is incremental as it adapts existing proactive collection strategies with a specific ethical focus.

The paper tackles the scarcity of public data in under-serviced domains like peer review by proposing a proactive data collection method, resulting in the 'Yes-Yes-Yes' workflow that meets ethical and legal standards and is empirically studied for dataset size and biases.

The shift towards publicly available text sources has enabled language processing at unprecedented scale, yet leaves under-serviced the domains where public and openly licensed data is scarce. Proactively collecting text data for research is a viable strategy to address this scarcity, but lacks systematic methodology taking into account the many ethical, legal and confidentiality-related aspects of data collection. Our work presents a case study on proactive data collection in peer review -- a challenging and under-resourced NLP domain. We outline ethical and legal desiderata for proactive data collection and introduce "Yes-Yes-Yes", the first donation-based peer reviewing data collection workflow that meets these requirements. We report on the implementation of Yes-Yes-Yes at ACL Rolling Review and empirically study the implications of proactive data collection for the dataset size and the biases induced by the donation behavior on the peer reviewing platform.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes