Character-based Neural Embeddings for Tweet Clustering
This work addresses tweet clustering for social media analysis, offering an incremental improvement by adapting existing neural methods to handle multilingual and noisy text.
The paper tackles tweet clustering by using character-based neural networks to improve performance, overcoming vocabulary explosion issues in word-based models and enabling multilingual processing, with evaluation results and code provided online.
In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and allows for the seamless processing of the multilingual content. Our evaluation results and code are available on-line at https://github.com/vendi12/tweet2vec_clustering