CLMay 27, 2023

Complementary and Integrative Health Lexicon (CIHLex) and Entity Recognition in the Literature

Huixue Zhou, Robin Austin, Sheng-Chieh Lu, Greg Silverman, Yuqi Zhou, Halil Kilicoglu, Hua Xu, Rui Zhang

arXiv:2305.17353v20.51 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of underrepresented complementary and integrative health terminology in biomedical literature for researchers and practitioners, but it is incremental as it builds on existing NLP methods and resources.

The study constructed a Complementary and Integrative Health Lexicon (CIHLex) with 198 concepts and 1090 terms to address underrepresentation in biomedical terminologies, and found that BLUEBERT achieved the highest F1-score of 0.90 in named entity recognition, outperforming other models like MetaMap and GPT-3.5 Turbo.

Objective: Our study aimed to construct an exhaustive Complementary and Integrative Health (CIH) Lexicon (CIHLex) to better represent the often underrepresented physical and psychological CIH approaches in standard terminologies. We also intended to apply advanced Natural Language Processing (NLP) models such as Bidirectional Encoder Representations from Transformers (BERT) and GPT-3.5 Turbo for CIH named entity recognition, evaluating their performance against established models like MetaMap and CLAMP. Materials and Methods: We constructed the CIHLex by integrating various resources, compiling and integrating data from biomedical literature and relevant knowledge bases. The Lexicon encompasses 198 unique concepts with 1090 corresponding unique terms. We matched these concepts to the Unified Medical Language System (UMLS). Additionally, we developed and utilized BERT models and compared their efficiency in CIH named entity recognition to that of other models such as MetaMap, CLAMP, and GPT3.5-turbo. Results: From the 198 unique concepts in CIHLex, 62.1% could be matched to at least one term in the UMLS. Moreover, 75.7% of the mapped UMLS Concept Unique Identifiers (CUIs) were categorized as "Therapeutic or Preventive Procedure." Among the models applied to CIH named entity recognition, BLUEBERT delivered the highest macro average F1-score of 0.90, surpassing other models. Conclusion: Our CIHLex significantly augments representation of CIH approaches in biomedical literature. Demonstrating the utility of advanced NLP models, BERT notably excelled in CIH entity recognition. These results highlight promising strategies for enhancing standardization and recognition of CIH terminology in biomedical contexts.

View on arXiv PDF

Similar