STAYKATE: Hybrid In-Context Example Selection Combining Representativeness Sampling and Retrieval-based Approach -- A Case Study on Science Domains
This work addresses the challenge of efficient in-context example selection for scientific information extraction, where data scarcity and annotation costs are issues, representing an incremental advancement in domain-specific methods.
The paper tackled the problem of selecting in-context examples for large language models in scientific information extraction by proposing STAYKATE, a hybrid method combining representativeness sampling and retrieval-based approaches, which outperformed traditional supervised and existing selection methods across three domain-specific datasets, with notable improvements for challenging entity types.
Large language models (LLMs) demonstrate the ability to learn in-context, offering a potential solution for scientific information extraction, which often contends with challenges such as insufficient training data and the high cost of annotation processes. Given that the selection of in-context examples can significantly impact performance, it is crucial to design a proper method to sample the efficient ones. In this paper, we propose STAYKATE, a static-dynamic hybrid selection method that combines the principles of representativeness sampling from active learning with the prevalent retrieval-based approach. The results across three domain-specific datasets indicate that STAYKATE outperforms both the traditional supervised methods and existing selection methods. The enhancement in performance is particularly pronounced for entity types that other methods pose challenges.