CL AINov 6, 2023

DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase

Dawei Li, Yaxuan Li, Dheeraj Mekala, Shuyao Li, Yulin wang, Xueqi Wang, William Hogan, Jingbo Shang

arXiv:2311.03319v13.611 citationsh-index: 15

Originality Incremental advance

AI Analysis

This addresses the challenge of limited annotated data for ICL in NLP, offering an incremental improvement for low-resource applications.

The paper tackles the problem of requiring high-quality annotated demonstrations for In-Context Learning (ICL) in low-resource scenarios by proposing DAIL, a method that uses self-paraphrase data augmentation and majority voting, which outperforms standard ICL and other ensemble-based methods in such settings.

In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks. However, ICL requires high-quality annotated demonstrations which might not be available in real-world scenarios. To overcome this limitation, we propose \textbf{D}ata \textbf{A}ugmentation for \textbf{I}n-Context \textbf{L}earning (\textbf{DAIL}). DAIL leverages the intuition that large language models are more familiar with the content generated by themselves. It first utilizes the language model to generate paraphrases of the test sample and employs majority voting to determine the final result based on individual predictions. Our extensive empirical evaluation shows that DAIL outperforms the standard ICL method and other ensemble-based methods in the low-resource scenario. Additionally, we explore the use of voting consistency as a confidence score of the model when the logits of predictions are inaccessible. We believe our work will stimulate further research on ICL in low-resource settings.

View on arXiv PDF

Similar