CLApr 21, 2018

Eval all, trust a few, do wrong to none: Comparing sentence generation models

Ondřej Cífka, Aliaksei Severyn, Enrique Alfonseca, Katja Filippova

arXiv:1804.07972v27.154 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for standardized evaluation in text generation research, which is incremental as it builds on existing metrics but aims to unify them into a rigorous protocol.

The paper tackles the problem of evaluating neural generative models for text generation by proposing a comprehensive evaluation protocol using both automatic and human metrics on generated samples and reconstructions, aiming to establish a new standard for model comparison.

In this paper, we study recent neural generative models for text generation related to variational autoencoders. Previous works have employed various techniques to control the prior distribution of the latent codes in these models, which is important for sampling performance, but little attention has been paid to reconstruction error. In our study, we follow a rigorous evaluation protocol using a large set of previously used and novel automatic and human evaluation metrics, applied to both generated samples and reconstructions. We hope that it will become the new evaluation standard when comparing neural generative models for text.

View on arXiv PDF

Similar