CLApr 27, 2024

Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language

arXiv:2404.17832v12 citationsh-index: 6Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of few-shot learning for Polish language tasks, providing a benchmark and evaluation for researchers, but it is incremental as it applies existing methods to a new language-specific dataset.

The authors tackled few-shot learning for classification tasks in Polish by evaluating methods like fine-tuning, linear probing, SetFit, and in-context learning (ICL) on a new benchmark of 7 tasks, finding that ICL with commercial models like GPT-3.5 performed best but still had a 14 percentage point gap compared to full fine-tuning.

We introduce a few-shot benchmark consisting of 7 different classification tasks native to the Polish language. We conducted an empirical comparison with 0 and 16 shots between fine-tuning, linear probing, SetFit, and in-context learning (ICL) using various pre-trained commercial and open-source models. Our findings reveal that ICL achieves the best performance, with commercial models like GPT-3.5 and GPT-4 attaining the best performance. However, there remains a significant 14 percentage points gap between our best few-shot learning score and the performance of HerBERT-large fine-tuned on the entire training dataset. Among the techniques, SetFit emerges as the second-best approach, closely followed by linear probing. We observed the worst and most unstable performance with non-linear head fine-tuning. Results for ICL indicate that continual pre-training of models like Mistral-7b or Llama-2-13b on Polish corpora is beneficial. This is confirmed by the improved performances of Bielik-7b and Trurl-13b, respectively. To further support experiments in few-shot learning for Polish, we are releasing handcrafted templates for the ICL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes