IRCLLGMLJun 10, 2020

A novel sentence embedding based topic detection method for micro-blog

arXiv:2006.09977v1
Originality Incremental advance
AI Analysis

This work addresses topic detection for micro-blog analysis, presenting an incremental improvement with a novel clustering method.

The paper tackles topic detection in micro-blogs without prior knowledge of topic numbers by using a neural sentence embedding model and an improved clustering algorithm, achieving successful detection of all topics and keyword extraction on a Sina micro-blog dataset.

Topic detection is a challenging task, especially without knowing the exact number of topics. In this paper, we present a novel approach based on neural network to detect topics in the micro-blogging dataset. We use an unsupervised neural sentence embedding model to map the blogs to an embedding space. Our model is a weighted power mean word embedding model, and the weights are calculated by attention mechanism. Experimental result shows our embedding method performs better than baselines in sentence clustering. In addition, we propose an improved clustering algorithm referred as relationship-aware DBSCAN (RADBSCAN). It can discover topics from a micro-blogging dataset, and the topic number depends on dataset character itself. Moreover, in order to solve the problem of parameters sensitive, we take blog forwarding relationship as a bridge of two independent clusters. Finally, we validate our approach on a dataset from sina micro-blog. The result shows that we can detect all the topics successfully and extract keywords in each topic.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes