Fine-Grained Named Entities for Corona News
This provides a structured approach for extracting information from corona-related news texts, though it appears incremental as it applies existing NER methods to a new domain-specific dataset.
This study tackled the problem of analyzing unstructured corona news articles by creating a data annotation pipeline to generate training data with generic and domain-specific entities, then training named entity recognition models on this corpus and evaluating them on expert-annotated test sentences.
Information resources such as newspapers have produced unstructured text data in various languages related to the corona outbreak since December 2019. Analyzing these unstructured texts is time-consuming without representing them in a structured format; therefore, representing them in a structured format is crucial. An information extraction pipeline with essential tasks -- named entity tagging and relation extraction -- to accomplish this goal might be applied to these texts. This study proposes a data annotation pipeline to generate training data from corona news articles, including generic and domain-specific entities. Named entity recognition models are trained on this annotated corpus and then evaluated on test sentences manually annotated by domain experts evaluating the performance of a trained model. The code base and demonstration are available at https://github.com/sefeoglu/coronanews-ner.git.