CLSIOct 23, 2020

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

arXiv:2010.12421v21082 citations
Originality Synthesis-oriented
AI Analysis

This provides a standardized evaluation framework for researchers in social media NLP, though it is incremental as it consolidates existing tasks.

The authors tackled the fragmented evaluation landscape in social media NLP by introducing TweetEval, a unified benchmark with seven Twitter-specific classification tasks, and found that continuing pre-trained generic language models on Twitter corpora is effective.

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes