CL LG SIMay 15, 2020

COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter

Martin Müller, Marcel Salathé, Per E Kummervold

arXiv:2005.07503v112.0409 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a domain-specific tool for researchers and practitioners analyzing COVID-19 social media data, but it is incremental as it adapts an existing method to new data.

The authors tackled the problem of analyzing COVID-19 content on Twitter by releasing COVID-Twitter-BERT, a transformer model pretrained on COVID-19 Twitter messages, which achieved a 10-30% improvement over BERT-Large on classification tasks.

In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19. Our model shows a 10-30% marginal improvement compared to its base model, BERT-Large, on five different classification datasets. The largest improvements are on the target domain. Pretrained transformer models, such as CT-BERT, are trained on a specific target domain and can be used for a wide variety of natural language processing tasks, including classification, question-answering and chatbots. CT-BERT is optimised to be used on COVID-19 content, in particular social media posts from Twitter.

View on arXiv PDF Code

Similar