60.4LGApr 7
Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte CarloJelena Markovic-Voronov, Wenhui Zhu, Bo Long et al.
We introduce a principled probabilistic framework for reward-guided decoding in large language models, addressing the limitations of standard decoding methods that optimize token-level likelihood rather than sequence-level quality. Our method defines a reward-augmented target distribution over complete sequences by combining model transition probabilities with prefix-dependent reward potentials. Importantly, the approach is training-free: it leaves model weights unchanged and instead modifies the inference distribution via reward potentials, with all gains arising purely from inference-time sampling. To sample from this distribution, we develop Sequential Monte Carlo algorithms, including a computationally efficient prefix-only variant and a lookahead variant whose intermediate targets match the exact marginals of the full sequence distribution. The framework also integrates resample-move updates with Metropolis-Hastings rejuvenation and supports block-wise generation, subsuming common decoding strategies such as temperature sampling and power-tempered objectives. Empirical results across three 7B models show significant gains. On code generation (HumanEval), our method improves base performance by up to 54.9% and surpasses the strongest sampling baselines by 9.1%-15.3%. On mathematical reasoning (MATH500), it achieves gains of up to 8.8%. Notably, it reaches 87.8% on HumanEval and 78.4% on MATH500 with Qwen2.5-7B, consistently outperforming the reinforcement learning method GRPO.
IRAug 6, 2020Code
DeText: A Deep Text Ranking Framework with BERTWeiwei Guo, Xiaowei Liu, Sida Wang et al.
Ranking is the most important component in a search system. Mostsearch systems deal with large amounts of natural language data,hence an effective ranking system requires a deep understandingof text semantics. Recently, deep learning based natural languageprocessing (deep NLP) models have generated promising results onranking systems. BERT is one of the most successful models thatlearn contextual embedding, which has been applied to capturecomplex query-document relations for search ranking. However,this is generally done by exhaustively interacting each query wordwith each document word, which is inefficient for online servingin search product systems. In this paper, we investigate how tobuild an efficient BERT-based ranking model for industry use cases.The solution is further extended to a general ranking framework,DeText, that is open sourced and can be applied to various rankingproductions. Offline and online experiments of DeText on threereal-world search systems present significant improvement overstate-of-the-art approaches.
IRMay 3, 2012
Multi-Faceted Ranking of News Articles using Post-Read ActionsDeepak Agarwal, Bee-Chung Chen, Xuanhui Wang
Personalized article recommendation is important to improve user engagement on news sites. Existing work quantifies engagement primarily through click rates. We argue that quality of recommendations can be improved by incorporating different types of "post-read" engagement signals like sharing, commenting, printing and e-mailing article links. More specifically, we propose a multi-faceted ranking problem for recommending news articles where each facet corresponds to a ranking problem to maximize actions of a post-read action type. The key technical challenge is to estimate the rates of post-read action types by mitigating the impact of enormous data sparsity, we do so through several variations of factor models. To exploit correlations among post-read action types we also introduce a novel variant called locally augmented tensor (LAT) model. Through data obtained from a major news site in the US, we show that factor models significantly outperform a few baseline IR models and the LAT model significantly outperforms several other variations of factor models. Our findings show that it is possible to incorporate post-read signals that are commonly available on online news sites to improve quality of recommendations.