CLAILGSep 8, 2022

CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets

arXiv:2209.03528v2581 citationsh-index: 4
AI Analysis

This work addresses the challenge of named-entity recognition for diseases in social media data, which is incremental as it applies existing methods to a specific domain and dataset.

The paper tackled the problem of recognizing disease mentions in Spanish tweets, achieving a strict F1 score of 0.869, which outperformed the competition mean of 0.675.

This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which concerns the recognition of diseases mentioned in Spanish tweets. Before classifying each token, we encode each token with a transformer encoder using features from Multilingual RoBERTa Large, UMLS gazetteer, and DISTEMIST gazetteer, among others. We obtain a strict F1 score of 0.869, with competition mean of 0.675, standard deviation of 0.245, and median of 0.761.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes