IRCLOct 11, 2020

ComStreamClust: a communicative multi-agent approach to text clustering in streaming data

arXiv:2010.05349v2
Originality Incremental advance
AI Analysis

This addresses topic detection for governments and healthcare companies to monitor issues like pandemics, but it appears incremental as it builds on existing clustering and embedding techniques.

The paper tackles the problem of detecting and tracking sub-topics in streaming social media data, such as COVID-19 tweets, by proposing ComStreamClust, a communicative multi-agent clustering approach, which shows effectiveness compared to existing methods on datasets like COVID-19 and FA CUP.

Topic detection is the task of determining and tracking hot topics in social media. Twitter is arguably the most popular platform for people to share their ideas with others about different issues. One such prevalent issue is the COVID-19 pandemic. Detecting and tracking topics on these kinds of issues would help governments and healthcare companies deal with this phenomenon. In this paper, we propose a novel, multi-agent, communicative clustering approach, so-called ComStreamClust for clustering sub-topics inside a broader topic, e.g., COVID-19. The proposed approach is parallelizable, and can simultaneously handle several data-point. The LaBSE sentence embedding is used to measure the semantic similarity between two tweets. ComStreamClust has been evaluated on two datasets: the COVID-19 and the FA CUP. The results obtained from ComStreamClust approve the effectiveness of the proposed approach when compared to existing methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes