CLApr 29, 2020

Zero-shot topic generation

Oleg Vasilyev, Kathryn Evans, Anna Venancio-Marques, John Bohannon

arXiv:2004.13956v10.2

Originality Highly original

AI Analysis

This provides a zero-shot method for generating high-quality topic labels for news documents, which could benefit information retrieval and content organization tasks, though it is incremental in leveraging existing title generation models.

The paper tackles the problem of generating topic labels for documents without any topic-specific training data, using a model trained only for document title generation. The results show that the zero-shot model produces topic labels for news articles that are on average equal to or higher in quality than human-written ones, as judged by human annotators in a double-blind trial.

We present an approach to generating topics using a model trained only for document title generation, with zero examples of topics given during training. We leverage features that capture the relevance of a candidate span in a document for the generation of a title for that document. The output is a weighted collection of the phrases that are most relevant for describing the document and distinguishing it within a corpus, without requiring access to the rest of the corpus. We conducted a double-blind trial in which human annotators scored the quality of our machine-generated topics along with original human-written topics associated with news articles from The Guardian and The Huffington Post. The results show that our zero-shot model generates topic labels for news documents that are on average equal to or higher quality than those written by humans, as judged by humans.

View on arXiv PDF

Similar