CYJun 15, 2023
Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and HealthShubo Tian, Qiao Jin, Lana Yeganova et al. · tsinghua
ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction, and medical education, and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.
CLMay 8, 2022Code
Assigning Species Information to Corresponding Genes by a Sequence Labeling FrameworkLing Luo, Chih-Hsuan Wei, Po-Ting Lai et al.
The automatic assignment of species information to the corresponding genes in a research article is a critically important step in the gene normalization task, whereby a gene mention is normalized and linked to a database record or identifier by a text-mining algorithm. Existing methods typically rely on heuristic rules based on gene and species co-occurrence in the article, but their accuracy is suboptimal. We therefore developed a high-performance method, using a novel deep learning-based framework, to classify whether there is a relation between a gene and a species. Instead of the traditional binary classification framework in which all possible pairs of genes and species in the same article are evaluated, we treat the problem as a sequence-labeling task such that only a fraction of the pairs needs to be considered. Our benchmarking results show that our approach obtains significantly higher performance compared to that of the rule-based baseline method for the species assignment task (from 65.8% to 81.3% in accuracy). The source code and data for species assignment are freely available at https://github.com/ncbi/SpeciesAssignment.
CLJun 19, 2023Code
BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous DatasetsPo-Ting Lai, Chih-Hsuan Wei, Ling Luo et al.
Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4% to 79.6% in F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx's robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx.
CLApr 8, 2022
BioRED: A Rich Biomedical Relation Extraction DatasetLing Luo, Po-Ting Lai, Chih-Hsuan Wei et al.
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for bio-medical RE only focus on relations of a single type (e.g., protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types (e.g., gene/protein, disease, chemical) and relation pairs (e.g., gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Further, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including BERT-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient, and robust RE systems for biomedicine. The BioRED dataset and annotation guideline are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.
CLNov 30, 2022
AIONER: All-in-one scheme-based biomedical named entity recognition using deep learningLing Luo, Chih-Hsuan Wei, Po-Ting Lai et al.
Biomedical named entity recognition (BioNER) seeks to automatically recognize biomedical entities in natural language text, serving as a necessary foundation for downstream text mining tasks and applications such as information extraction and question answering. Manually labeling training data for the BioNER task is costly, however, due to the significant domain expertise required for accurate annotation. The resulting data scarcity causes current BioNER approaches to be prone to overfitting, to suffer from limited generalizability, and to address a single entity type at a time (e.g., gene or disease). We therefore propose a novel all-in-one (AIO) scheme that uses external data from existing annotated resources to enhance the accuracy and stability of BioNER models. We further present AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO schema. We evaluate AIONER on 14 BioNER benchmark tasks and show that AIONER is effective, robust, and compares favorably to other state-of-the-art approaches such as multi-task learning. We further demonstrate the practical utility of AIONER in three independent tasks to recognize entity types not previously seen in training data, as well as the advantages of AIONER over existing methods for processing biomedical text at a large scale (e.g., the entire PubMed data).
CLJan 23, 2025Code
Enhancing Biomedical Relation Extraction with DirectionalityPo-Ting Lai, Chih-Hsuan Wei, Shubo Tian et al.
Biological relation networks contain rich information for understanding the biological mechanisms behind the relationship of entities such as genes, proteins, diseases, and chemicals. The vast growth of biomedical literature poses significant challenges updating the network knowledge. The recent Biomedical Relation Extraction Dataset (BioRED) provides valuable manual annotations, facilitating the develop-ment of machine-learning and pre-trained language model approaches for automatically identifying novel document-level (inter-sentence context) relationships. Nonetheless, its annotations lack directionality (subject/object) for the entity roles, essential for studying complex biological networks. Herein we annotate the entity roles of the relationships in the BioRED corpus and subsequently propose a novel multi-task language model with soft-prompt learning to jointly identify the relationship, novel findings, and entity roles. Our results in-clude an enriched BioRED corpus with 10,864 directionality annotations. Moreover, our proposed method outperforms existing large language models such as the state-of-the-art GPT-4 and Llama-3 on two benchmarking tasks. Our source code and dataset are available at https://github.com/ncbi-nlp/BioREDirect.
CLMay 12
MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question AnsweringRezarta Islamaj, Robert Leaman, Joey Chan et al.
Evaluating large language models (LLMs) in the biomedical domain requires benchmarks that can distinguish reasoning from pattern matching and remain discriminative as model capabilities improve. Existing biomedical question answering (QA) benchmarks are limited in this respect. Multiple-choice formats can allow models to succeed through answer elimination rather than inference, while widely circulated exam-style datasets are increasingly vulnerable to performance saturation and training data contamination. Multi-hop reasoning, defined as the ability to integrate information across multiple sources to derive an answer, is central to clinically meaningful tasks such as diagnostic support, literature-based discovery, and hypothesis generation, yet remains underrepresented in current biomedical QA benchmarks. MedHopQA is a disease-centered multi-hop reasoning benchmark consisting of 1,000 expert-curated question-answer pairs introduced as a shared task at BioCreative IX. Each question requires synthesis of information across two distinct Wikipedia articles, and answers are provided in an open-ended free-text format. Gold annotations are augmented with ontology-grounded synonym sets from MONDO, NCBI Gene, and NCBI Taxonomy to support both lexical and concept-level evaluation. MedHopQA was constructed through a structured process combining human annotation, triage, iterative verification, and LLM-as-a-judge validation. To reduce leaderboard gaming and contamination risk, the 1,000 scored questions are embedded within a publicly downloadable set of 10,000 questions, with answers withheld, on a CodaBench leaderboard. MedHopQA provides both a benchmark and a reusable framework for constructing future biomedical QA datasets that prioritize compositional reasoning, saturation resistance, and contamination resistance as core design constraints.
CLMay 10, 2023Code
Benchmarking large language models for biomedical natural language processing applications and recommendationsQingyu Chen, Yan Hu, Xueqing Peng et al.
The rapid growth of biomedical literature poses challenges for manual knowledge curation and synthesis. Biomedical Natural Language Processing (BioNLP) automates the process. While Large Language Models (LLMs) have shown promise in general domains, their effectiveness in BioNLP tasks remains unclear due to limited benchmarks and practical guidelines. We perform a systematic evaluation of four LLMs, GPT and LLaMA representatives on 12 BioNLP benchmarks across six applications. We compare their zero-shot, few-shot, and fine-tuning performance with traditional fine-tuning of BERT or BART models. We examine inconsistencies, missing information, hallucinations, and perform cost analysis. Here we show that traditional fine-tuning outperforms zero or few shot LLMs in most tasks. However, closed-source LLMs like GPT-4 excel in reasoning-related tasks such as medical question answering. Open source LLMs still require fine-tuning to close performance gaps. We find issues like missing information and hallucinations in LLM outputs. These results offer practical insights for applying LLMs in BioNLP.
IRJan 29, 2019Code
Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction TaskMing-Siang Huang, Po-Ting Lai, Richard Tzong-Han Tsai et al.
The advancement of biomedical named entity recognition (BNER) and biomedical relation extraction (BRE) researches promotes the development of text mining in biological domains. As a cornerstone of BRE, robust BNER system is required to identify the mentioned NEs in plain texts for further relation extraction stage. However, the current BNER corpora, which play important roles in these tasks, paid less attention to achieve the criteria for BRE task. In this study, we present Revised JNLPBA corpus, the revision of JNLPBA corpus, to broaden the applicability of a NER corpus from BNER to BRE task. We preserve the original entity types including protein, DNA, RNA, cell line and cell type while all the abstracts in JNLPBA corpus are manually curated by domain experts again basis on the new annotation guideline focusing on the specific NEs instead of general terms. Simultaneously, several imperfection issues in JNLPBA are pointed out and made up in the new corpus. To compare the adaptability of different NER systems in Revised JNLPBA and JNLPBA corpora, the F1-measure was measured in three open sources NER systems including BANNER, Gimli and NERSuite. In the same circumstance, all the systems perform average 10% better in Revised JNLPBA than in JNLPBA. Moreover, the cross-validation test is carried out which we train the NER systems on JNLPBA/Revised JNLPBA corpora and access the performance in both protein-protein interaction extraction (PPIE) and biomedical event extraction (BEE) corpora to confirm that the newly refined Revised JNLPBA is a competent NER corpus in biomedical relation application. The revised JNLPBA corpus is freely available at iasl-btm.iis.sinica.edu.tw/BNER/Content/Revised_JNLPBA.zip.
AIOct 24, 2024
Demystifying Large Language Models for Medicine: A PrimerQiao Jin, Nicholas Wan, Robert Leaman et al.
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices. This approach consists of several main phases, including formulating the task, choosing LLMs, prompt engineering, fine-tuning, and deployment. We start with the discussion of critical considerations in identifying healthcare tasks that align with the core capabilities of LLMs and selecting models based on the selected task and data, performance requirements, and model interface. We then review the strategies, such as prompt engineering and fine-tuning, to adapt standard LLMs to specialized medical tasks. Deployment considerations, including regulatory compliance, ethical guidelines, and continuous monitoring for fairness and bias, are also discussed. By providing a structured step-by-step methodology, this tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice, ensuring that these powerful technologies are applied in a safe, reliable, and impactful manner.
CLApr 22, 2024
EnzChemRED, a rich enzyme chemistry relation extraction datasetPo-Ting Lai, Elisabeth Coudert, Lucila Aimo et al.
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.
CLJan 19, 2024
PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical KnowledgeChih-Hsuan Wei, Alexis Allot, Po-Ting Lai et al.
PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.
CLJan 11, 2021
BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph TransformerPo-Ting Lai, Zhiyong Lu
A biomedical relation statement is commonly expressed in multiple sentences and consists of many concepts, including gene, disease, chemical, and mutation. To automatically extract information from biomedical literature, existing biomedical text-mining approaches typically formulate the problem as a cross-sentence n-ary relation-extraction task that detects relations among n entities across multiple sentences, and use either a graph neural network (GNN) with long short-term memory (LSTM) or an attention mechanism. Recently, Transformer has been shown to outperform LSTM on many natural language processing (NLP) tasks. In this work, we propose a novel architecture that combines Bidirectional Encoder Representations from Transformers with Graph Transformer (BERT-GT), through integrating a neighbor-attention mechanism into the BERT architecture. Unlike the original Transformer architecture, which utilizes the whole sentence(s) to calculate the attention of the current token, the neighbor-attention mechanism in our method calculates its attention utilizing only its neighbor tokens. Thus, each token can pay attention to its neighbor information with little noise. We show that this is critically important when the text is very long, as in cross-sentence or abstract-level relation-extraction tasks. Our benchmarking results show improvements of 5.44% and 3.89% in accuracy and F1-measure over the state-of-the-art on n-ary and chemical-protein relation datasets, suggesting BERT-GT is a robust approach that is applicable to other biomedical relation extraction tasks or datasets.
CLSep 17, 2020
PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype OntologyLing Luo, Shankai Yan, Po-Ting Lai et al.
Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation. In this paper, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is then used to automatically build a distantly supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase (n-gram from input sentence) into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods.