General Bayesian Policy Learning

Masahiro Kato

arXiv:2602.23672v1

Originality Incremental advance

AI Analysis

This work addresses decision-making under uncertainty for applications such as treatment choice and portfolio selection, offering a novel Bayesian approach that is incremental in its method development.

The study tackles policy learning problems like treatment choice and portfolio selection by proposing the General Bayes framework, which uses a squared-loss surrogate for welfare maximization and yields a Gaussian pseudo-likelihood interpretation, with theoretical guarantees provided in a PAC-Bayes style.

This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an action set to maximize its expected welfare. Typical examples include treatment choice and portfolio selection. In such problems, the statistical target is a decision rule, and the prediction of each outcome $Y(a)$ is not necessarily of primary interest. We formulate this policy learning problem by loss-based Bayesian updating. Our main technical device is a squared-loss surrogate for welfare maximization. We show that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter $ζ>0$. This rewriting yields a General Bayes posterior over decision rules that admits a Gaussian pseudo-likelihood interpretation. We clarify two Bayesian interpretations of the resulting generalized posterior, a working Gaussian view and a decision-theoretic loss-based view. As one implementation example, we introduce neural networks with tanh-squashed outputs. Finally, we provide theoretical guarantees in a PAC-Bayes style.

View on arXiv PDF

Similar