Negation-Instance Based Evaluation of End-to-End Negation Resolution
This work addresses a methodological issue for researchers in natural language processing, specifically in negation resolution, by standardizing evaluation to enable fair comparisons, though it is incremental as it builds on existing shared tasks.
The paper tackles the problem of inconsistent evaluation metrics in negation resolution, proposing a negation-instance based approach to make system comparisons meaningful, and provides results for state-of-the-art systems on three English corpora with publicly available scripts.
In this paper, we revisit the task of negation resolution, which includes the subtasks of cue detection (e.g. "not", "never") and scope resolution. In the context of previous shared tasks, a variety of evaluation metrics have been proposed. Subsequent works usually use different subsets of these, including variations and custom implementations, rendering meaningful comparisons between systems difficult. Examining the problem both from a linguistic perspective and from a downstream viewpoint, we here argue for a negation-instance based approach to evaluating negation resolution. Our proposed metrics correspond to expectations over per-instance scores and hence are intuitively interpretable. To render research comparable and to foster future work, we provide results for a set of current state-of-the-art systems for negation resolution on three English corpora, and make our implementation of the evaluation scripts publicly available.