Non-neural Models Matter: A Re-evaluation of Neural Referring Expression Generation Systems
This work addresses the problem of over-reliance on neural models in natural language generation for researchers, highlighting that non-neural methods can be competitive or superior in specific tasks like referring expression generation.
The paper re-evaluates neural referring expression generation systems by comparing them to non-neural approaches, finding that rule-based and machine learning-based systems achieve on-par or better performance on two English datasets, with a machine learning system performing best on the more realistic WSJ dataset.
In recent years, neural models have often outperformed rule-based and classic Machine Learning approaches in NLG. These classic approaches are now often disregarded, for example when new neural models are evaluated. We argue that they should not be overlooked, since, for some tasks, well-designed non-neural approaches achieve better performance than neural ones. In this paper, the task of generating referring expressions in linguistic context is used as an example. We examined two very different English datasets (WEBNLG and WSJ), and evaluated each algorithm using both automatic and human evaluations. Overall, the results of these evaluations suggest that rule-based systems with simple rule sets achieve on-par or better performance on both datasets compared to state-of-the-art neural REG systems. In the case of the more realistic dataset, WSJ, a machine learning-based system with well-designed linguistic features performed best. We hope that our work can encourage researchers to consider non-neural models in future.