CL AI NE SIJul 26, 2016

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

Soroush Vosoughi, Prashanth Vijayaraghavan, Deb Roy

arXiv:1607.07514v113.5182 citations

Originality Incremental advance

AI Analysis

This work addresses the need for effective tweet embeddings for various categorization tasks, though it is incremental as it builds on existing encoder-decoder and embedding techniques.

The authors tackled the problem of generating general-purpose vector representations for tweets by introducing Tweet2Vec, a character-level CNN-LSTM encoder-decoder model trained on 3 million English tweets, which outperformed previous state-of-the-art methods in tweet semantic similarity and sentiment categorization tasks.

We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.

View on arXiv PDF

Similar