CLFeb 16, 2024

Fine Tuning Named Entity Extraction Models for the Fantasy Domain

arXiv:2402.10662v11.0h-index: 6MERCon

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of extracting meaningful information from fantasy domain text for applications like gaming or lore analysis, but it is incremental as it applies an existing method to new data.

The paper tackles the problem of extracting domain-specific entities from fantasy text by fine-tuning the Trankit NER framework on Dungeons & Dragons monster lore, achieving an 87.86% F1 score for monster name identification, which surpasses zero-shot Trankit and two FLAIR models.

Named Entity Recognition (NER) is a sequence classification Natural Language Processing task where entities are identified in the text and classified into predefined categories. It acts as a foundation for most information extraction systems. Dungeons and Dragons (D&D) is an open-ended tabletop fantasy game with its own diverse lore. DnD entities are domain-specific and are thus unrecognizable by even the state-of-the-art off-the-shelf NER systems as the NER systems are trained on general data for pre-defined categories such as: person (PERS), location (LOC), organization (ORG), and miscellaneous (MISC). For meaningful extraction of information from fantasy text, the entities need to be classified into domain-specific entity categories as well as the models be fine-tuned on a domain-relevant corpus. This work uses available lore of monsters in the D&D domain to fine-tune Trankit, which is a prolific NER framework that uses a pre-trained model for NER. Upon this training, the system acquires the ability to extract monster names from relevant domain documents under a novel NER tag. This work compares the accuracy of the monster name identification against; the zero-shot Trankit model and two FLAIR models. The fine-tuned Trankit model achieves an 87.86% F1 score surpassing all the other considered models.

View on arXiv PDF

Similar