ML CLMay 25, 2017

Neural Models for Documents with Metadata

arXiv:1705.09296v2132 citations

Originality Incremental advance

AI Analysis

This addresses the need for practical and customizable models in real-world document analysis, though it is incremental as it builds on existing variational inference methods.

The paper tackles the problem of modeling document collections with metadata, which is often ignored by common approaches, by proposing a general neural framework based on topic models that allows flexible metadata incorporation and achieves strong performance with tradeoffs in perplexity, coherence, and sparsity.

Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information. While specialized models have been developed for particular applications, few are widely used in practice, as customization typically requires derivation of a custom inference algorithm. In this paper, we build on recent advances in variational inference methods and propose a general neural framework, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models. Our approach achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity. Finally, we demonstrate the potential of our framework through an exploration of a corpus of articles about US immigration.

View on arXiv PDF

Similar