CLAIMay 11, 2025

PLHF: Prompt Optimization with Few-Shot Human Feedback

arXiv:2505.07886v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck in prompt optimization for LLMs, particularly in scenarios with ambiguous quality metrics, offering a more efficient solution for developers and researchers.

The paper tackles the challenge of optimizing prompts for large language models when output quality is hard to assess without clear metrics, by introducing PLHF, a few-shot prompt optimization framework that uses a single round of human feedback and outperforms prior grading strategies on public and industrial datasets.

Automatic prompt optimization frameworks are developed to obtain suitable prompts for large language models (LLMs) with respect to desired output quality metrics. Although existing approaches can handle conventional tasks such as fixed-solution question answering, defining the metric becomes complicated when the output quality cannot be easily assessed by comparisons with standard golden samples. Consequently, optimizing the prompts effectively and efficiently without a clear metric becomes a critical challenge. To address the issue, we present PLHF (which stands for "P"rompt "L"earning with "H"uman "F"eedback), a few-shot prompt optimization framework inspired by the well-known RLHF technique. Different from naive strategies, PLHF employs a specific evaluator module acting as the metric to estimate the output quality. PLHF requires only a single round of human feedback to complete the entire prompt optimization process. Empirical results on both public and industrial datasets show that PLHF outperforms prior output grading strategies for LLM prompt optimizations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes