CL AINov 19, 2024

GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning

Yuze Liu, Tingjie Liu, Tiehua Zhang, Youhua Xia, Jinze Wang, Zhishu Shen, Jiong Jin, Fei Richard Yu

arXiv:2411.14479v11.01 citationsh-index: 12

Originality Incremental advance

AI Analysis

This addresses the labor-intensive challenge of prompt optimization for NLP practitioners, offering an automated solution that improves performance on tasks like text generation.

The paper tackles the problem of manual prompt engineering for large language models by proposing GRL-Prompt, a reinforcement learning framework that uses a knowledge graph to automatically optimize prompts, resulting in average increases of 0.10 in ROUGE-1, 0.07 in ROUGE-2, 0.07 in ROUGE-L, and 0.05 in BLEU compared to state-of-the-art methods.

Large language models (LLMs) have demonstrated impressive success in a wide range of natural language processing (NLP) tasks due to their extensive general knowledge of the world. Recent works discovered that the performance of LLMs is heavily dependent on the input prompt. However, prompt engineering is usually done manually in a trial-and-error fashion, which can be labor-intensive and challenging in order to find the optimal prompts. To address these problems and unleash the utmost potential of LLMs, we propose a novel LLMs-agnostic framework for prompt optimization, namely GRL-Prompt, which aims to automatically construct optimal prompts via reinforcement learning (RL) in an end-to-end manner. To provide structured action/state representation for optimizing prompts, we construct a knowledge graph (KG) that better encodes the correlation between the user query and candidate in-context examples. Furthermore, a policy network is formulated to generate the optimal action by selecting a set of in-context examples in a rewardable order to construct the prompt. Additionally, the embedding-based reward shaping is utilized to stabilize the RL training process. The experimental results show that GRL-Prompt outperforms recent state-of-the-art methods, achieving an average increase of 0.10 in ROUGE-1, 0.07 in ROUGE-2, 0.07 in ROUGE-L, and 0.05 in BLEU.

View on arXiv PDF

Similar