MLAug 16, 2023
Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive QuestioningZiteng Cheng, Anthony Coache, Sebastian Jaimungal
We investigate a framework for robo-advisors to estimate non-expert clients' risk aversion using adaptive binary-choice questionnaires. We model risk aversion using cost functions and spectral risk measures in a static setting. We prove the finite-sample identifiability and, for properly designed questions, obtain a convergence rate of $\sqrt{N}$ up to a logarithmic factor, where $N$ is the number of questions. We introduce the notion of distinguishing power and demonstrate, through simulated experiments, that designing questions by maximizing distinguishing power achieves satisfactory accuracy in learning risk aversion with fewer than 50 questions. We also provide a preliminary investigation of an infinite-horizon setting with an additional discount factor for dynamic risk aversion, establishing qualitative identifiability in this case.
LGFeb 27, 2023
Distributional Method for Risk Averse Reinforcement LearningZiteng Cheng, Sebastian Jaimungal, Nick Martin
We introduce a distributional method for learning the optimal policy in risk averse Markov decision process with finite state action spaces, latent costs, and stationary dynamics. We assume sequential observations of states, actions, and costs and assess the performance of a policy using dynamic risk measures constructed from nested Kusuoka-type conditional risk mappings. For such performance criteria, randomized policies may outperform deterministic policies, therefore, the candidate policies lie in the d-dimensional simplex where d is the cardinality of the action space. Existing risk averse reinforcement learning methods seldom concern randomized policies, naïve extensions to current setting suffer from the curse of dimensionality. By exploiting certain structures embedded in the corresponding dynamic programming principle, we propose a distributional learning method for seeking the optimal policy. The conditional distribution of the value function is casted into a specific type of function, which is chosen with in mind the ease of risk averse optimization. We use a deep neural network to approximate said function, illustrate that the proposed method avoids the curse of dimensionality in the exploration phase, and explore the method's performance with a wide range of model parameters that are picked randomly.
MLJun 13, 2024Code
Learning conditional distributions on continuous spacesCyril Bénézet, Ziteng Cheng, Sebastian Jaimungal
We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature and target spaces. Our approach involves clustering data near varying query points in the feature space to create empirical measures in the target space. We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors. We establish upper bounds for the convergence rates of both methods and, from these bounds, deduce optimal configurations for the radius and the number of neighbors. We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice. For efficiency, our training process utilizes approximate nearest neighbors search with random binary space partitioning. Additionally, we employ the Sinkhorn algorithm and a sparsity-enforced transport plan. Our empirical findings demonstrate that, with a suitably designed structure, the neural network has the ability to adapt to a suitable level of Lipschitz continuity locally. For reproducibility, our code is available at \url{https://github.com/zcheng-a/LCD_kNN}.