CLApr 26, 2024

ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations

arXiv:2404.17481v279 citationsh-index: 8HUMEVAL

Originality Synthesis-oriented

AI Analysis

This work is incremental, contributing to reproducibility efforts in NLP by assessing the consistency of human evaluation results over time.

This paper partially reproduces a human evaluation study on generating fact-checking explanations, finding that their results support the original findings with similar patterns, though slight variations were observed.

This paper presents a partial reproduction of Generating Fact Checking Explanations by Anatanasova et al (2020) as part of the ReproHum element of the ReproNLP shared task to reproduce the findings of NLP research regarding human evaluation. This shared task aims to investigate the extent to which NLP as a field is becoming more or less reproducible over time. Following the instructions provided by the task organisers and the original authors, we collect relative rankings of 3 fact-checking explanations (comprising a gold standard and the outputs of 2 models) for 40 inputs on the criteria of Coverage. The results of our reproduction and reanalysis of the original work's raw results lend support to the original findings, with similar patterns seen between the original work and our reproduction. Whilst we observe slight variation from the original results, our findings support the main conclusions drawn by the original authors pertaining to the efficacy of their proposed models.

View on arXiv PDF

Similar