CLOct 6, 2020

COD3S: Diverse Generation with Discrete Semantic Signatures

arXiv:2010.02882v1998 citations
Originality Incremental advance
AI Analysis

This addresses the issue of low diversity in one-to-many sequence generation for NLP applications, though it is an incremental improvement over existing methods.

The paper tackles the problem of generating semantically diverse sentences with seq2seq models, which typically produce homogeneous outputs, by introducing COD3S, a two-stage method using LSH-based semantic codes. Results show improved diversity without degrading performance on causal generation tasks, as validated by automatic and human evaluations.

We present COD3S, a novel method for generating semantically diverse sentences using neural sequence-to-sequence (seq2seq) models. Conditioned on an input, seq2seq models typically produce semantically and syntactically homogeneous sets of sentences and thus perform poorly on one-to-many sequence generation tasks. Our two-stage approach improves output diversity by conditioning generation on locality-sensitive hash (LSH)-based semantic sentence codes whose Hamming distances highly correlate with human judgments of semantic textual similarity. Though it is generally applicable, we apply COD3S to causal generation, the task of predicting a proposition's plausible causes or effects. We demonstrate through automatic and human evaluation that responses produced using our method exhibit improved diversity without degrading task performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes