PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection
This addresses the challenge of task adaptation for low-resource named entity detection in domains like biomedicine, where labeled data is scarce, by enabling effective use of noisy pseudo-annotations.
The paper tackled the problem of in-context learning sensitivity to demonstration quality in low-resource named entity detection, finding that partially correct demonstrations can be as effective as fully correct ones and proposing a framework that outperforms standard in-context learning with zero human annotation on biomedical datasets.
In-context learning (ICL) enables Large Language Models (LLMs) to perform tasks using few demonstrations, facilitating task adaptation when labeled examples are hard to obtain. However, ICL is sensitive to the choice of demonstrations, and it remains unclear which demonstration attributes enable in-context generalization. In this work, we conduct a perturbation study of in-context demonstrations for low-resource Named Entity Detection (NED). Our surprising finding is that in-context demonstrations with partially correct annotated entity mentions can be as effective for task transfer as fully correct demonstrations. Based off our findings, we propose Pseudo-annotated In-Context Learning (PICLe), a framework for in-context learning with noisy, pseudo-annotated demonstrations. PICLe leverages LLMs to annotate many demonstrations in a zero-shot first pass. We then cluster these synthetic demonstrations, sample specific sets of in-context demonstrations from each cluster, and predict entity mentions using each set independently. Finally, we use self-verification to select the final set of entity mentions. We evaluate PICLe on five biomedical NED datasets and show that, with zero human annotation, PICLe outperforms ICL in low-resource settings where limited gold examples can be used as in-context demonstrations.