CLMay 18, 2019

Microblog Hashtag Generation via Encoding Conversation Contexts

arXiv:1905.07584v11099 citations
Originality Incremental advance
AI Analysis

This addresses the problem of content understanding for microblog platforms by enabling rare and unseen hashtag generation, though it is incremental as it builds on existing sequence generation techniques.

The paper tackles the problem of automatic hashtag generation for microblog posts by proposing a sequence generation framework that treats hashtags as word sequences, and it significantly outperforms state-of-the-art classification-based models on large-scale Twitter and Weibo datasets.

Automatic hashtag annotation plays an important role in content understanding for microblog posts. To date, progress made in this field has been restricted to phrase selection from limited candidates, or word-level hashtag discovery using topic models. Different from previous work considering hashtags to be inseparable, our work is the first effort to annotate hashtags with a novel sequence generation framework via viewing the hashtag as a short sequence of words. Moreover, to address the data sparsity issue in processing short microblog posts, we propose to jointly model the target posts and the conversation contexts initiated by them with bidirectional attention. Extensive experimental results on two large-scale datasets, newly collected from English Twitter and Chinese Weibo, show that our model significantly outperforms state-of-the-art models based on classification. Further studies demonstrate our ability to effectively generate rare and even unseen hashtags, which is however not possible for most existing methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes