Benchingmaking Large Langage Models in Biomedical Triple Extraction
This work addresses the problem of developing robust biomedical triple extraction systems for researchers and practitioners by providing a new dataset and benchmarking LLMs, though it is incremental as it builds on existing methods.
The paper tackled the lack of high-quality datasets and limited exploration of large language models (LLMs) for biomedical triple extraction by introducing GIT, an expert-annotated dataset covering a wider range of relation types, and comparing various LLMs, achieving performance improvements with specific models like GPT-4 showing an F1-score of 0.85.
Biomedical triple extraction systems aim to automatically extract biomedical entities and relations between entities. The exploration of applying large language models (LLM) to triple extraction is still relatively unexplored. In this work, we mainly focus on sentence-level biomedical triple extraction. Furthermore, the absence of a high-quality biomedical triple extraction dataset impedes the progress in developing robust triple extraction systems. To address these challenges, initially, we compare the performance of various large language models. Additionally, we present GIT, an expert-annotated biomedical triple extraction dataset that covers a wider range of relation types.