CLMay 13, 2024

Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques

arXiv:2405.07875v123.579 citationsh-index: 16HUMEVAL

Originality Synthesis-oriented

AI Analysis

This highlights reproducibility issues in ML evaluations, which is an incremental but important concern for researchers relying on published results.

The authors attempted to reproduce a metric-based evaluation of controllable text generation techniques and found that reruns often yield different results than originally reported, sometimes uncovering errors in the original work.

Rerunning a metric-based evaluation should be more straightforward, and results should be closer, than in a human-based evaluation, especially where code and model checkpoints are made available by the original authors. As this report of our efforts to rerun a metric-based evaluation of a set of single-attribute and multiple-attribute controllable text generation (CTG) techniques shows however, such reruns of evaluations do not always produce results that are the same as the original results, and can reveal errors in the reporting of the original work.

View on arXiv PDF

Similar