CLIMATELI: Evaluating Entity Linking on Climate Change Data
This provides a new benchmark for evaluating entity linking in climate change contexts, which is incremental as it applies existing methods to a new domain-specific dataset.
The authors tackled the problem of entity linking on climate change data by creating CLIMATELI, the first manually annotated dataset with 3,087 entity spans linked to Wikipedia, and found that existing entity linking models perform notably worse than humans at token and entity levels.
Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights into CC. We present CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia. Using CLIMATELI (CLIMATe Entity LInking), we evaluate existing entity linking (EL) systems on the CC topic across various genres and propose automated filtering methods for CC entities. We find that the performance of EL models notably lags behind humans at both token and entity levels. Testing within the scope of retaining or excluding non-nominal and/or non-CC entities particularly impacts the models' performances.