Robo-advising: Learning Investors' Risk Preferences via Portfolio Choices
This addresses the challenge of personalized financial advice for retail investors, though it is incremental as it builds on existing RL methods for a specific domain.
The paper tackles the problem of robo-advisors not knowing investors' risk preferences by introducing a reinforcement learning framework that learns these preferences from portfolio choices, with the algorithm converging to optimal performance in polynomial time and potentially outperforming standalone investors.
We introduce a reinforcement learning framework for retail robo-advising. The robo-advisor does not know the investor's risk preference, but learns it over time by observing her portfolio choices in different market environments. We develop an exploration-exploitation algorithm which trades off costly solicitations of portfolio choices by the investor with autonomous trading decisions based on stale estimates of investor's risk aversion. We show that the algorithm's value function converges to the optimal value function of an omniscient robo-advisor over a number of periods that is polynomial in the state and action space. By correcting for the investor's mistakes, the robo-advisor may outperform a stand-alone investor, regardless of the investor's opportunity cost for making portfolio decisions.