CLDec 28, 2024

STAYKATE: Hybrid In-Context Example Selection Combining Representativeness Sampling and Retrieval-based Approach -- A Case Study on Science Domains

Chencheng Zhu, Kazutaka Shimada, Tomoki Taniguchi, Tomoko Ohkuma

arXiv:2412.20043v11.91 citationsh-index: 20

Originality Highly original

AI Analysis

This work addresses the challenge of efficient in-context example selection for scientific information extraction, where data scarcity and annotation costs are issues, representing an incremental advancement in domain-specific methods.

The paper tackled the problem of selecting in-context examples for large language models in scientific information extraction by proposing STAYKATE, a hybrid method combining representativeness sampling and retrieval-based approaches, which outperformed traditional supervised and existing selection methods across three domain-specific datasets, with notable improvements for challenging entity types.

Large language models (LLMs) demonstrate the ability to learn in-context, offering a potential solution for scientific information extraction, which often contends with challenges such as insufficient training data and the high cost of annotation processes. Given that the selection of in-context examples can significantly impact performance, it is crucial to design a proper method to sample the efficient ones. In this paper, we propose STAYKATE, a static-dynamic hybrid selection method that combines the principles of representativeness sampling from active learning with the prevalent retrieval-based approach. The results across three domain-specific datasets indicate that STAYKATE outperforms both the traditional supervised methods and existing selection methods. The enhancement in performance is particularly pronounced for entity types that other methods pose challenges.

View on arXiv PDF

Similar