Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space
This addresses the problem of efficient LLM alignment for users needing cost-effective deployment, though it appears incremental as it builds on existing test-time alignment and control methods.
The paper tackles the problem of aligning large language models (LLMs) at test-time to avoid high computational costs of fine-tuning, proposing AISP, a method that applies Gaussian perturbation to pre-logits to maximize expected rewards. The result shows AISP outperforms best-of-n sampling in rewards per sample and achieves higher rewards than other reward-based test-time alignment methods.
Test-time alignment of large language models (LLMs) attracts attention because fine-tuning LLMs requires high computational costs. In this paper, we propose a new test-time alignment method called adaptive importance sampling on pre-logits (AISP) on the basis of the sampling-based model predictive control with the stochastic control input. AISP applies the Gaussian perturbation into pre-logits, which are outputs of the penultimate layer, so as to maximize expected rewards with respect to the mean of the perturbation. We demonstrate that the optimal mean is obtained by importance sampling with sampled rewards. AISP outperforms best-of-n sampling in terms of rewards over the number of used samples and achieves higher rewards than other reward-based test-time alignment methods.