CLSep 20, 2022

Twitter Topic Classification

Dimosthenis Antypas, Asahi Ushio, Jose Camacho-Collados, Leonardo Neves, Vítor Silva, Francesco Barbieri

arXiv:2209.09824v131.3593 citationsh-index: 40

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better content categorization on social media platforms, but it is incremental as it focuses on dataset creation and benchmarking rather than novel methodological advances.

The paper tackles the problem of organizing social media content by introducing a new task for tweet topic classification and releasing two datasets for training and testing models, with a quantitative evaluation showing insights into the challenges of the task.

Social media platforms host discussions about a wide variety of topics that arise everyday. Making sense of all the content and organising it into categories is an arduous task. A common way to deal with this issue is relying on topic modeling, but topics discovered using this technique are difficult to interpret and can differ from corpus to corpus. In this paper, we present a new task based on tweet topic classification and release two associated datasets. Given a wide range of topics covering the most important discussion points in social media, we provide training and testing data from recent time periods that can be used to evaluate tweet classification models. Moreover, we perform a quantitative evaluation and analysis of current general- and domain-specific language models on the task, which provide more insights on the challenges and nature of the task.

View on arXiv PDF

Similar