CLLGMar 21, 2022

Language modeling via stochastic processes

Stanford
arXiv:2203.11370v229 citationsh-index: 75
AI Analysis

This addresses the issue of meandering and incoherence in long text generation for users of language models, representing an incremental improvement over existing methods.

The paper tackles the problem of incoherent long text generation in language models by proposing Time Control (TC), a method that uses contrastive representations for generation, resulting in up to 15% better ordering and 90% better text length consistency compared to baselines.

Modern language models can generate high-quality short texts. However, they often meander or are incoherent when generating longer texts. These issues arise from the next-token-only language modeling objective. Recent work in self-supervised learning suggests that models can learn good latent representations via contrastive learning, which can be effective for discriminative tasks. Our work analyzes the application of contrastive representations for generative tasks, like long text generation. We propose one approach for leveraging constrastive representations, which we call Time Control (TC). TC first learns a contrastive representation of the target text domain, then generates text by decoding from these representations. Compared to domain-specific methods and fine-tuning GPT2 across a variety of text domains, TC performs competitively to methods specific for learning sentence representations on discourse coherence. On long text generation settings, TC preserves the text structure both in terms of ordering (up to $+15\%$ better) and text length consistency (up to $+90\%$ better).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes