Mining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N Sampling
This addresses efficiency problems for researchers and practitioners using LLMs, though it is an incremental improvement on existing reward modeling approaches.
The paper tackles the computational inefficiency of best-of-N sampling for Large Language Models by introducing SWIFT, a lightweight technique that uses hidden states instead of text-based rewards. Experiments show SWIFT outperforms baselines with less than 0.005% of parameters and requires only a few training samples.
Enhancing Large Language Model (LLM)'s performance with best-of-N sampling is effective and has attracted significant attention. However, it is computationally prohibitive due to massive, data-hungry text-based reward models. By changing the data source from text to hidden states, we introduce SWIFT (Simple Weighted Intrinsic Feedback Technique), a novel, lightweight technique that leverages the rich information embedded in LLM hidden states to address these issues, which operates on token-level and consists of only linear layers. Extensive experiments show that SWIFT outperforms baselines with less than 0.005% of the parameters of baselines, requiring only a few samples for training, demonstrating significant efficiency improvement. SWIFT's robust scalability, applicability to some closed-source models via logits, and ability to be combined with traditional reward models to yield further performance gains underscore its practical value.