Want to Identify, Extract and Normalize Adverse Drug Reactions in Tweets? Use RoBERTa
This work addresses the problem of monitoring drug safety from social media data for healthcare applications, but it is incremental as it applies an existing method to a specific shared task.
The paper tackled identifying, extracting, and normalizing adverse drug reactions (ADRs) in tweets using RoBERTa, achieving F1-scores of 58% for classification (12% above average) and up to 70.1% for extraction (13.7% above average).
This paper presents our approach for task 2 and task 3 of Social Media Mining for Health (SMM4H) 2020 shared tasks. In task 2, we have to differentiate adverse drug reaction (ADR) tweets from nonADR tweets and is treated as binary classification. Task3 involves extracting ADR mentions and then mapping them to MedDRA codes. Extracting ADR mentions is treated as sequence labeling and normalizing ADR mentions is treated as multi-class classification. Our system is based on pre-trained language model RoBERTa and it achieves a) F1-score of 58% in task2 which is 12% more than the average score b) relaxed F1-score of 70.1% in ADR extraction of task 3 which is 13.7% more than the average score and relaxed F1-score of 35% in ADR extraction + normalization of task3 which is 5.8% more than the average score. Overall, our models achieve promising results in both the tasks with significant improvements over average scores.