LGAICLMLMay 18, 2025

Mining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N Sampling

arXiv:2505.12225v26 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses efficiency problems for researchers and practitioners using LLMs, though it is an incremental improvement on existing reward modeling approaches.

The paper tackles the computational inefficiency of best-of-N sampling for Large Language Models by introducing SWIFT, a lightweight technique that uses hidden states instead of text-based rewards. Experiments show SWIFT outperforms baselines with less than 0.005% of parameters and requires only a few training samples.

Enhancing Large Language Model (LLM)'s performance with best-of-N sampling is effective and has attracted significant attention. However, it is computationally prohibitive due to massive, data-hungry text-based reward models. By changing the data source from text to hidden states, we introduce SWIFT (Simple Weighted Intrinsic Feedback Technique), a novel, lightweight technique that leverages the rich information embedded in LLM hidden states to address these issues, which operates on token-level and consists of only linear layers. Extensive experiments show that SWIFT outperforms baselines with less than 0.005% of the parameters of baselines, requiring only a few samples for training, demonstrating significant efficiency improvement. SWIFT's robust scalability, applicability to some closed-source models via logits, and ability to be combined with traditional reward models to yield further performance gains underscore its practical value.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes