LGSep 2, 2022
Optimistic Optimization of Gaussian Process SamplesJulia Grosse, Cheng Zhang, Philipp Hennig
Bayesian optimization is a popular formalism for global optimization, but its computational costs limit it to expensive-to-evaluate functions. A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function. We investigate to which degree the conceptual advantages of Bayesian Optimization can be combined with the computational efficiency of optimistic optimization. By mapping the kernel to a dissimilarity, we obtain an optimistic optimization algorithm for the Bayesian Optimization setting with a run-time of up to $\mathcal{O}(N \log N)$. As a high-level take-away we find that, when using stationary kernels on objectives of relatively low evaluation cost, optimistic optimization can be strongly preferable over Bayesian optimization, while for strongly coupled and parametric models, good implementations of Bayesian optimization can perform much better, even at low evaluation cost. We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.
LGJul 4, 2024
Uncertainty-Guided Likelihood Tree SearchJulia Grosse, Ruotian Wu, Ahmad Rashid et al.
Tree search is a fundamental tool for planning, as many sequential decision-making problems can be framed as searching over tree-structured spaces. We propose an uncertainty-guided tree search algorithm for settings where the reward function is a log-likelihood function of the paths. Due to the combinatorial explosion of the tree size, the set of paths for which one can obtain rewards is sparse, particularly when the likelihood is obtained through expensive evaluations, such as by querying a large language model. We address this challenge by deriving an probabilistic search heuristic based on regularity assumptions for the likelihood. Unlike existing tree search methods, the proposed method can perform backtracking and trade-off exploration with exploitation, and yet does not require expensive roll-outs, or sophisticated Bayesian inference. Through extensive on-model and off-model experiments on timely, large-scale practical applications, we demonstrate that our method identifies paths with high likelihood while requiring fewer costly evaluations.
LGJun 12, 2024Code
A Critical Look At Tokenwise Reward-Guided Text GenerationAhmad Rashid, Ruotian Wu, Julia Grosse et al.
Large language models (LLMs) can be improved by aligning with human preferences through fine-tuning -- the so-called reinforcement learning from human feedback (RLHF). However, the cost of fine-tuning an LLM is prohibitive for many users. Due to their ability to bypass LLM fine-tuning, prediction-time tokenwise reward-guided text generation (RGTG) methods have recently been proposed. They use a reward model trained on full sequences to score partial sequences during decoding in a bid to steer the generation towards sequences with high rewards. However, these methods have so far been only heuristically motivated and poorly analyzed. In this work, we show that reward models trained on full sequences are not compatible with scoring partial sequences. To alleviate this, we propose to train a Bradley-Terry reward model on partial sequences explicitly, and autoregressively sample from the implied tokenwise policy during decoding. We study the properties of this reward model and the resulting policy: we show that this policy is proportional to the ratio of two distinct RLHF policies. Our simple approach outperforms previous RGTG methods and performs similarly to strong offline baselines without large-scale LLM fine-tuning. Code for our work is available at https://github.com/ahmadrash/PARGS
LGJun 16, 2021
Probabilistic DAG SearchJulia Grosse, Cheng Zhang, Philipp Hennig
Exciting contemporary machine learning problems have recently been phrased in the classic formalism of tree search -- most famously, the game of Go. Interestingly, the state-space underlying these sequential decision-making problems often posses a more general latent structure than can be captured by a tree. In this work, we develop a probabilistic framework to exploit a search space's latent structure and thereby share information across the search tree. The method is based on a combination of approximate inference in jointly Gaussian models for the explored part of the problem, and an abstraction for the unexplored part that imposes a reduction of complexity ad hoc. We empirically find our algorithm to compare favorably to existing non-probabilistic alternatives in Tic-Tac-Toe and a feature selection application.