CLJun 10, 2019

Topic-Aware Neural Keyphrase Generation for Social Media Language

arXiv:1906.03889v11115 citations
Originality Incremental advance
AI Analysis

This addresses the problem of data sparsity in social media language understanding for users and platforms, though it is incremental as it builds on existing seq2seq methods with topic modeling.

The paper tackles keyphrase prediction for social media posts by proposing a topic-aware neural generation framework that creates absent keyphrases, and it shows significant performance improvements over existing extraction and generation models on English and Chinese datasets.

A huge volume of user-generated content is daily produced on social media. To facilitate automatic language understanding, we study keyphrase prediction, distilling salient information from massive posts. While most existing methods extract words from source posts to form keyphrases, we propose a sequence-to-sequence (seq2seq) based neural keyphrase generation framework, enabling absent keyphrases to be created. Moreover, our model, being topic-aware, allows joint modeling of corpus-level latent topic representations, which helps alleviate the data sparsity that widely exhibited in social media language. Experiments on three datasets collected from English and Chinese social media platforms show that our model significantly outperforms both extraction and generation models that do not exploit latent topics. Further discussions show that our model learns meaningful topics, which interprets its superiority in social media keyphrase generation.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes