A Sequential Optimal Learning Approach to Automated Prompt Engineering in Large Language Models
This work addresses the challenge of reducing manual effort in prompt design for LLM users, offering a scalable solution for applications with costly prompt evaluations, though it is incremental as it builds on existing optimal learning methods.
The paper tackles the problem of automated prompt engineering for large language models by proposing an optimal learning framework that sequentially identifies effective prompt features under a limited evaluation budget, demonstrating significant outperformance over benchmark strategies on instruction induction tasks.
Designing effective prompts is essential to guiding large language models (LLMs) toward desired responses. Automated prompt engineering aims to reduce reliance on manual effort by streamlining the design, refinement, and optimization of natural language prompts. This paper proposes an optimal learning framework for automated prompt engineering, designed to sequentially identify effective prompt features while efficiently allocating a limited evaluation budget. We introduce a feature-based method to express prompts, which significantly broadens the search space. Bayesian regression is employed to utilize correlations among similar prompts, accelerating the learning process. To efficiently explore the large space of prompt features for a high quality prompt, we adopt the forward-looking Knowledge-Gradient (KG) policy for sequential optimal learning. The KG policy is computed efficiently by solving mixed-integer second-order cone optimization problems, making it scalable and capable of accommodating prompts characterized only through constraints. We demonstrate that our method significantly outperforms a set of benchmark strategies assessed on instruction induction tasks. The results highlight the advantages of using the KG policy for prompt learning given a limited evaluation budget. Our framework provides a solution to deploying automated prompt engineering in a wider range applications where prompt evaluation is costly.