CLLGJun 17, 2024

Analysing zero-shot temporal relation extraction on clinical notes using temporal consistency

arXiv:2406.11486v130 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This is an incremental study for biomedical NLP researchers, highlighting limitations of LLMs in a specific clinical task.

This paper tackled zero-shot temporal relation extraction on clinical notes using large language models (LLMs), finding that LLMs perform worse than fine-tuned models in F1 score and struggle with temporal consistency, with accuracy not improving even when consistency is achieved.

This paper presents the first study for temporal relation extraction in a zero-shot setting focusing on biomedical text. We employ two types of prompts and five LLMs (GPT-3.5, Mixtral, Llama 2, Gemma, and PMC-LLaMA) to obtain responses about the temporal relations between two events. Our experiments demonstrate that LLMs struggle in the zero-shot setting performing worse than fine-tuned specialized models in terms of F1 score, showing that this is a challenging task for LLMs. We further contribute a novel comprehensive temporal analysis by calculating consistency scores for each LLM. Our findings reveal that LLMs face challenges in providing responses consistent to the temporal properties of uniqueness and transitivity. Moreover, we study the relation between the temporal consistency of an LLM and its accuracy and whether the latter can be improved by solving temporal inconsistencies. Our analysis shows that even when temporal consistency is achieved, the predictions can remain inaccurate.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes