ML LGJan 8, 2025

Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning

arXiv:2501.04870v216.87 citationsh-index: 2

Originality Highly original

AI Analysis

This work addresses the challenge of transfer learning in non-stationary reinforcement learning for dynamic decision-making in domains like business and healthcare, representing a novel method for a known bottleneck.

The paper tackles the problem of leveraging sample trajectories from diverse populations to enhance reinforcement learning performance for target populations with limited data, by introducing a novel re-weighted targeting procedure and transfer deep Q-learning that achieves theoretical guarantees and empirical improvements on synthetic and real datasets.

In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when sample sizes are limited. While existing transfer learning methods primarily focus on linear regression settings, they lack direct applicability to reinforcement learning algorithms. This paper pioneers the study of transfer learning for dynamic decision scenarios modeled by non-stationary finite-horizon Markov decision processes, utilizing neural networks as powerful function approximators and backward inductive learning. We demonstrate that naive sample pooling strategies, effective in regression settings, fail in Markov decision processes.To address this challenge, we introduce a novel ``re-weighted targeting procedure'' to construct ``transferable RL samples'' and propose ``transfer deep $Q^*$-learning'', enabling neural network approximation with theoretical guarantees. We assume that the reward functions are transferable and deal with both situations in which the transition densities are transferable or nontransferable. Our analytical techniques for transfer learning in neural network approximation and transition density transfers have broader implications, extending to supervised transfer learning with neural networks and domain shift scenarios. Empirical experiments on both synthetic and real datasets corroborate the advantages of our method, showcasing its potential for improving decision-making through strategically constructing transferable RL samples in non-stationary reinforcement learning contexts.

View on arXiv PDF

Similar