DLIRJun 1, 2021

Harvesting the Public MeSH Note field

arXiv:2106.00302v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a specific data extraction problem for biomedical researchers or librarians working with the MeSH thesaurus, but it is incremental as it applies existing methods to a particular dataset.

The researchers analyzed the Public MeSH Note field from 2006 to 2020 to extract information about new descriptors' previous status as Supplementary Concept Records, using a semi-automated approach based on regular expressions that minimized manual effort in most cases.

In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020. The aim of this analysis was to extract information about the previous status of these new descriptors as Supplementary Concept Records. The Public MeSH Note field contains information in semi-structured text, meant to be read by humans. Therefore, we adopted a semi-automated approach, based on regular expressions, to extract information from it. In the large majority of cases, we managed to minimize the required manual effort for extracting the previous state of a new descriptor as a Supplementary Concept Record. The source code for this analysis is openly available on GitHub.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes