Meta-Prompt Optimization for LLM-Based Sequential Decision Making
This addresses the problem of performance variability in LLM agents for researchers and practitioners by optimizing prompts, though it is incremental as it builds on existing adversarial bandit methods.
The paper tackles the challenge of automatically optimizing meta-prompts for LLM-based agents in sequential decision-making tasks like Bayesian optimization and multi-armed bandits, proposing algorithms (EXPO and EXPO-ES) that significantly improve performance, as demonstrated through extensive experiments.
Large language models (LLMs) have recently been employed as agents to solve sequential decision-making tasks such as Bayesian optimization and multi-armed bandits (MAB). These works usually adopt an LLM for sequential action selection by providing it with a fixed, manually designed meta-prompt. However, numerous previous works have found that the prompt has a significant impact on the performance of the LLM, which calls for a method to automatically optimize the meta-prompt for LLM-based agents. Unfortunately, the non-stationarity in the reward observations during LLM-based sequential decision-making makes meta-prompt optimization highly challenging. To address this challenge, we draw inspirations from adversarial bandit algorithms, which are inherently capable of handling non-stationary reward observations. Building on this foundation, we propose our EXPonential-weight algorithm for prompt Optimization} (EXPO) to automatically optimize the task description and meta-instruction in the meta-prompt for LLM-based agents. We also extend EXPO to additionally optimize the exemplars (i.e., history of interactions) in the meta-prompt to further enhance the performance, hence introducing our EXPO-ES algorithm. We use extensive experiments to show that our algorithms significantly improve the performance of LLM-based sequential decision-making.