CL IRNov 20, 2021

Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles

Hyunjae Kim, Mujeen Sung, Wonjin Yoon, Sungjoon Park, Jaewoo Kang

arXiv:2111.10584v10.510 citations

Originality Synthesis-oriented

AI Analysis

This work addresses chemical identification for biomedical researchers by enhancing entity coverage in full-text articles, though it is incremental as it builds on existing methods for a new dataset.

The paper tackled chemical identification in full-text articles, improving tagging consistency and entity coverage through methods like majority voting and a hybrid dictionary-neural approach, resulting in a 1st place ranking in NER with significant performance gains over baselines and 80+ submissions.

This paper is a technical report on our system submitted to the chemical identification task of the BioCreative VII Track 2 challenge. The main feature of this challenge is that the data consists of full-text articles, while current datasets usually consist of only titles and abstracts. To effectively address the problem, we aim to improve tagging consistency and entity coverage using various methods such as majority voting within the same articles for named entity recognition (NER) and a hybrid approach that combines a dictionary and a neural model for normalization. In the experiments on the NLM-Chem dataset, we show that our methods improve models' performance, particularly in terms of recall. Finally, in the official evaluation of the challenge, our system was ranked 1st in NER by significantly outperforming the baseline model and more than 80 submissions from 16 teams.

View on arXiv PDF

Similar