CL AI IR LGOct 14, 2022

TweetNERD -- End to End Entity Linking Benchmark for Tweets

Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, Ali Mollahosseini

arXiv:2210.08129v12.115 citationsh-index: 23Has Code

Originality Synthesis-oriented

AI Analysis

This provides a large, temporally diverse benchmark for NERD research on Tweets, facilitating work in information retrieval and NLP applications, but it is incremental as it focuses on dataset creation rather than new methods.

The authors tackled the problem of benchmarking Named Entity Recognition and Disambiguation (NERD) systems on Tweets by introducing TweetNERD, a dataset of 340K+ Tweets from 2010-2021, and reported performance of existing methods on specific splits.

Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. We describe evaluation setup with TweetNERD for three NERD tasks: Named Entity Recognition (NER), Entity Linking with True Spans (EL), and End to End Entity Linking (End2End); and provide performance of existing publicly available methods on specific TweetNERD splits. TweetNERD is available at: https://doi.org/10.5281/zenodo.6617192 under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Check out more details at https://github.com/twitter-research/TweetNERD.

View on arXiv PDF Code

Similar