Split Q Learning: Reinforcement Learning with Two-Stream Rewards
This work addresses the problem of developing more nuanced AI agents for complex socioeconomic systems and behavioral modeling, though it appears incremental as it builds on standard Q-learning with a new parametric twist.
The paper tackles reinforcement learning by proposing a parametric framework that extends Q-learning to incorporate a two-stream reward processing model inspired by human decision-making, with potential applications in understanding multi-agent interactions and modeling reward processing abnormalities in neurological conditions.
Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems.