CL AIOct 12, 2022

PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling Systems Evaluation

Ishan Jindal, Alexandre Rademaker, Khoi-Nguyen Tran, Huaiyu Zhu, Hiroshi Kanayama, Marina Danilevsky, Yunyao Li

IBM

arXiv:2210.06408v123.1267 citationsh-index: 30Has Code

Originality Incremental advance

AI Analysis

This addresses a practical issue for researchers and practitioners in natural language processing by providing a more accurate evaluation metric for SRL systems, though it is incremental as it builds on existing evaluation frameworks.

The authors tackled the problem of inaccurate evaluation in semantic role labeling (SRL) systems by proposing a new metric, PriMeSRL, which accounts for error propagation across steps; they found that using this metric significantly lowers the quality scores and alters the rankings of state-of-the-art SRL models.

Semantic role labeling (SRL) identifies the predicate-argument structure in a sentence. This task is usually accomplished in four steps: predicate identification, predicate sense disambiguation, argument identification, and argument classification. Errors introduced at one step propagate to later steps. Unfortunately, the existing SRL evaluation scripts do not consider the full effect of this error propagation aspect. They either evaluate arguments independent of predicate sense (CoNLL09) or do not evaluate predicate sense at all (CoNLL05), yielding an inaccurate SRL model performance on the argument classification task. In this paper, we address key practical issues with existing evaluation scripts and propose a more strict SRL evaluation metric PriMeSRL. We observe that by employing PriMeSRL, the quality evaluation of all SoTA SRL models drops significantly, and their relative rankings also change. We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.

View on arXiv PDF Code

Similar