How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models
This addresses the challenge of adapting black-box AI systems for personalization and environment-specific tasks, representing a novel method rather than an incremental improvement.
The paper tackles the problem of customizing black-box large language models (LLMs) that cannot be modified directly, by introducing Advisor Models—lightweight policies trained with reinforcement learning to issue dynamic steering instructions per input. The result shows that these models outperform static prompt optimizers across reasoning and personalization domains, with demonstrated generalizability across black-box models and robustness to out-of-distribution inputs.
Foundation models are increasingly deployed as black-box services, where model weights cannot be modified and customization is limited to prompting. While static prompt optimization has shown promise, it produces a single fixed prompt that fails to adapt to different inputs, users, or environments. We introduce Advisor Models, lightweight parametric policies trained with reinforcement learning to reactively issue natural language steering instructions in-context to black-box models. The advisor is a second small model that sits between the input and the model, shaping behavior on a per-instance basis using reward signals from the environment. Across multiple domains involving reasoning and personalization, we show that Advisor Models outperform static prompt optimizers, discovering environment dynamics and improving downstream task performance. We also demonstrate the generalizability of advisors by transferring them across black-box models, as well as the framework's ability to achieve specialization while retaining robustness to out-of-distribution inputs. Viewed more broadly, Advisor Models provide a learnable interface to black-box systems where the advisor acts as a parametric, environment-specific memory. We argue that dynamic optimization of black-box models via Advisor Models is a promising direction for enabling personalization and environment-adaptable AI with frontier-level capabilities.