LGNov 18, 2025

LAUD: Integrating Large Language Models with Active Learning for Unlabeled Data

arXiv:2511.14738v14.1

Originality Incremental advance

AI Analysis

This addresses the issue for practitioners who need efficient fine-tuning of LLMs without extensive labeled data, though it is incremental as it builds on existing active learning and LLM methods.

The paper tackles the problem of lacking labeled data for fine-tuning large language models (LLMs) by introducing LAUD, a framework that integrates LLMs with active learning for unlabeled datasets, and shows it outperforms zero-shot or few-shot learning on commodity name classification tasks.

Large language models (LLMs) have shown a remarkable ability to generalize beyond their pre-training data, and fine-tuning LLMs can elevate performance to human-level and beyond. However, in real-world scenarios, lacking labeled data often prevents practitioners from obtaining well-performing models, thereby forcing practitioners to highly rely on prompt-based approaches that are often tedious, inefficient, and driven by trial and error. To alleviate this issue of lacking labeled data, we present a learning framework integrating LLMs with active learning for unlabeled dataset (LAUD). LAUD mitigates the cold-start problem by constructing an initial label set with zero-shot learning. Experimental results show that LLMs derived from LAUD outperform LLMs with zero-shot or few-shot learning on commodity name classification tasks, demonstrating the effectiveness of LAUD.

View on arXiv PDF

Similar