CL AI IRApr 1, 2025

Synthesized Annotation Guidelines are Knowledge-Lite Boosters for Clinical Information Extraction

Enshuo Hsu, Martin Ugbala, Krishna Kumar Kookal, Zouaidi Kawtar, Nicholas L. Rider, Muhammad F. Walji, Kirk Roberts

arXiv:2504.02871v14.92 citationsh-index: 30

Originality Highly original

AI Analysis

This work addresses the challenge of creating reusable and efficient annotation guidelines for clinical named entity recognition, offering a knowledge-lite solution that reduces human effort in biomedical domains.

The study tackled the problem of labor-intensive and knowledge-intensive construction of annotation guidelines for clinical information extraction by proposing a self-improving method using LLMs to synthesize guidelines with minimal human input, resulting in improvements of up to 25.86% in strict F1 scores on benchmarks and outperforming human-written guidelines by up to 4.14% in most tasks.

Generative information extraction using large language models, particularly through few-shot learning, has become a popular method. Recent studies indicate that providing a detailed, human-readable guideline-similar to the annotation guidelines traditionally used for training human annotators can significantly improve performance. However, constructing these guidelines is both labor- and knowledge-intensive. Additionally, the definitions are often tailored to meet specific needs, making them highly task-specific and often non-reusable. Handling these subtle differences requires considerable effort and attention to detail. In this study, we propose a self-improving method that harvests the knowledge summarization and text generation capacity of LLMs to synthesize annotation guidelines while requiring virtually no human input. Our zero-shot experiments on the clinical named entity recognition benchmarks, 2012 i2b2 EVENT, 2012 i2b2 TIMEX, 2014 i2b2, and 2018 n2c2 showed 25.86%, 4.36%, 0.20%, and 7.75% improvements in strict F1 scores from the no-guideline baseline. The LLM-synthesized guidelines showed equivalent or better performance compared to human-written guidelines by 1.15% to 4.14% in most tasks. In conclusion, this study proposes a novel LLM self-improving method that requires minimal knowledge and human input and is applicable to multiple biomedical domains.

View on arXiv PDF

Similar