CLAIApr 22, 2020

Logical Natural Language Generation from Open-Domain Tables

arXiv:2004.10404v21037 citationsHas Code
AI Analysis

This work addresses the lack of logical inference in neural NLG for applications requiring accurate and coherent text generation from structured data, though it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of generating natural language statements that are logically entailed by facts in open-domain tables, proposing a new NLG task and evaluating it on the TabFact dataset with new automatic metrics. They found that pre-trained language models significantly boost fluency and logical fidelity, while RL and adversarial training trade fluency for fidelity, and coarse-to-fine generation helps maintain fluency while partially improving fidelity.

Neural natural language generation (NLG) models have recently shown remarkable progress in fluency and coherence. However, existing studies on neural NLG are primarily focused on surface-level realizations with limited emphasis on logical inference, an important aspect of human thinking and language. In this paper, we suggest a new NLG task where a model is tasked with generating natural language statements that can be \emph{logically entailed} by the facts in an open-domain semi-structured table. To facilitate the study of the proposed logical NLG problem, we use the existing TabFact dataset \cite{chen2019tabfact} featured with a wide range of logical/symbolic inferences as our testbed, and propose new automatic metrics to evaluate the fidelity of generation models w.r.t.\ logical inference. The new task poses challenges to the existing monotonic generation frameworks due to the mismatch between sequence order and logical order. In our experiments, we comprehensively survey different generation architectures (LSTM, Transformer, Pre-Trained LM) trained with different algorithms (RL, Adversarial Training, Coarse-to-Fine) on the dataset and made following observations: 1) Pre-Trained LM can significantly boost both the fluency and logical fidelity metrics, 2) RL and Adversarial Training are trading fluency for fidelity, 3) Coarse-to-Fine generation can help partially alleviate the fidelity issue while maintaining high language fluency. The code and data are available at \url{https://github.com/wenhuchen/LogicNLG}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes