FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross-Entropy
This addresses the critical open problem of evaluating natural language generation for researchers and practitioners, though it appears incremental as it builds on existing psycholinguistic findings.
The paper tackled the problem of measuring the distance between machine-generated and human language by proposing FACE, a set of metrics based on Fourier analysis of cross-entropy, which effectively identifies the human-model gap, scales with model size, and correlates well with other metrics and human judgment.
Measuring the distance between machine-produced and human language is a critical open problem. Inspired by empirical findings from psycholinguistics on the periodicity of entropy in language, we propose FACE, a set of metrics based on Fourier Analysis of the estimated Cross-Entropy of language, for measuring the similarity between model-generated and human-written languages. Based on an open-ended generation task and the experimental data from previous studies, we find that FACE can effectively identify the human-model gap, scales with model size, reflects the outcomes of different sampling methods for decoding, correlates well with other evaluation metrics and with human judgment scores.