CLAINov 6, 2023

DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase

arXiv:2311.03319v111 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses the challenge of limited annotated data for ICL in NLP, offering an incremental improvement for low-resource applications.

The paper tackles the problem of requiring high-quality annotated demonstrations for In-Context Learning (ICL) in low-resource scenarios by proposing DAIL, a method that uses self-paraphrase data augmentation and majority voting, which outperforms standard ICL and other ensemble-based methods in such settings.

In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks. However, ICL requires high-quality annotated demonstrations which might not be available in real-world scenarios. To overcome this limitation, we propose \textbf{D}ata \textbf{A}ugmentation for \textbf{I}n-Context \textbf{L}earning (\textbf{DAIL}). DAIL leverages the intuition that large language models are more familiar with the content generated by themselves. It first utilizes the language model to generate paraphrases of the test sample and employs majority voting to determine the final result based on individual predictions. Our extensive empirical evaluation shows that DAIL outperforms the standard ICL method and other ensemble-based methods in the low-resource scenario. Additionally, we explore the use of voting consistency as a confidence score of the model when the logits of predictions are inaccessible. We believe our work will stimulate further research on ICL in low-resource settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes