Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024
This work addresses the specific problem of detecting software mentions in academic texts, which is incremental as it builds on existing NER methods with a multi-stage approach.
The paper tackled software mention recognition in scholarly publications by proposing a three-stage framework using BERTology models, achieving a weighted F1-score of 67.80% and ranking 3rd in the shared task.
This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our bestperforming system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted F1-score of 67.80%, delivering our team the 3rd rank in Sub-task I for the Software Mention Recognition task.