CL AIAug 8, 2025

Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models

Saaduddin Mahmud, Mason Nakamura, Kyle H. Wray, Shlomo Zilberstein

arXiv:2508.10030v1h-index: 49

Originality Incremental advance

AI Analysis

This addresses a methodological gap in aligning black-box LLMs for users needing trade-offs between objectives and inference budgets, though it is incremental as it builds on prior prompt optimization and inference scaling methods.

The paper tackles the problem that existing prompt optimization methods ignore inference strategies like Best-of-N Sampling, which creates a gap as these strategies affect alignment in black-box LLMs, and introduces IAPO to jointly optimize prompts and inference scale, showing effectiveness across six tasks.

Prompt optimization methods have demonstrated significant effectiveness in aligning black-box large language models (LLMs). In parallel, inference scaling strategies such as Best-of-N Sampling and Majority Voting have also proven to enhance alignment and performance by trading off computation. However, existing prompt optimization approaches are inference strategy agnostic; that is, they optimize prompts without regard to the inference strategy employed during deployment. This constitutes a significant methodological gap, as our empirical and theoretical analysis reveals a strong interdependence between these two paradigms. Moreover, we find that user preferences regarding trade-offs among multiple objectives and inference budgets substantially influence the choice of prompt and inference configuration. To address this gap, we introduce a unified novel framework named IAPO (Inference-Aware Prompt Optimization) that jointly optimizes the prompt and inference scale, while being aware of the inference budget and different task objectives. We then develop a fixed-budget training algorithm for IAPO, which we call PSST (Prompt Scaling via Sequential Trimming), and analyze finite-budget guarantees on error probability. Finally, we evaluate the effectiveness of PSST on six different tasks, including multi-objective text generation and reasoning, and demonstrate the critical role of incorporating inference-awareness when aligning black-box LLMs through prompt optimization.

View on arXiv PDF

Similar