Adversarial Generation and Encoding of Nested Texts
This addresses the challenge of handling nested texts like books for natural language processing applications, but it appears incremental as it builds on existing hierarchical and adversarial methods.
The paper tackles the problem of encoding, generating, and refining long, coherent documents with hierarchical annotations by proposing AGENT, a language model that learns vector representations for different text levels and trains them on reconstruction and generalized BERT tasks, resulting in improved coherence through adversarial generation and vector tree traversal.
In this paper we propose a new language model called AGENT, which stands for Adversarial Generation and Encoding of Nested Texts. AGENT is designed for encoding, generating and refining documents that consist of a long and coherent text, such as an entire book, provided they are hierarchically annotated (nested). i.e. divided into sentences, paragraphs and chapters. The core idea of our system is learning vector representations for each level of the text hierarchy (sentences, paragraphs, etc...), and train each such representation to perform 3 tasks: The task of reconstructing the sequence of vectors from a lower level that was used to create the representation, and generalized versions of the Masked Language Modeling (MLM) and "Next Sentence Prediction" tasks from BERT Devlin et al. [2018]. Additionally we present a new adversarial model for long text generation and suggest a way to improve the coherence of the generated text by traversing its vector representation tree.