CLAIJul 29, 2025

Improving Task Diversity in Label Efficient Supervised Finetuning of LLMs

arXiv:2507.21482v11 citationsh-index: 8EMNLP
Originality Incremental advance
AI Analysis

This work addresses the high annotation cost problem for developers finetuning LLMs for specialized applications, offering an incremental improvement over existing data selection methods.

The paper tackles the problem of label-efficient supervised finetuning of LLMs by proposing a task-diversity-based sampling strategy that selects examples using inverse confidence weighting, achieving a 4% increase in MMLU score and reducing annotation costs by up to 80% compared to existing methods.

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, but developing high-performing models for specialized applications often requires substantial human annotation -- a process that is time-consuming, labor-intensive, and expensive. In this paper, we address the label-efficient learning problem for supervised finetuning (SFT) by leveraging task-diversity as a fundamental principle for effective data selection. This is markedly different from existing methods based on the prompt-diversity. Our approach is based on two key observations: 1) task labels for different prompts are often readily available; 2) pre-trained models have significantly varying levels of confidence across tasks. We combine these facts to devise a simple yet effective sampling strategy: we select examples across tasks using an inverse confidence weighting strategy. This produces models comparable to or better than those trained with more complex sampling procedures, while being significantly easier to implement and less computationally intensive. Notably, our experimental results demonstrate that this method can achieve better accuracy than training on the complete dataset (a 4\% increase in MMLU score). Across various annotation budgets and two instruction finetuning datasets, our algorithm consistently performs at or above the level of the best existing methods, while reducing annotation costs by up to 80\%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes