ML LG MEJan 21, 2023

Quasi-optimal Reinforcement Learning with Continuous Actions

arXiv:2301.08940v211.89 citationsh-index: 7

Originality Highly original

AI Analysis

This work addresses the problem of safe and reliable reinforcement learning for medical treatment regimes, such as dose suggestion in diabetes, representing an incremental improvement by focusing on policy support constraints.

The paper tackled the challenge of applying reinforcement learning to continuous action spaces, particularly in medical dose selection, by developing a quasi-optimal learning algorithm that restricts policies to near-optimal actions to avoid harmful high dosages, achieving improved effectiveness and reliability in simulated experiments and a real diabetes dataset application.

Many real-world applications of reinforcement learning (RL) require making decisions in continuous action environments. In particular, determining the optimal dose level plays a vital role in developing medical treatment regimes. One challenge in adapting existing RL algorithms to medical applications, however, is that the popular infinite support stochastic policies, e.g., Gaussian policy, may assign riskily high dosages and harm patients seriously. Hence, it is important to induce a policy class whose support only contains near-optimal actions, and shrink the action-searching area for effectiveness and reliability. To achieve this, we develop a novel \emph{quasi-optimal learning algorithm}, which can be easily optimized in off-policy settings with guaranteed convergence under general function approximations. Theoretically, we analyze the consistency, sample complexity, adaptability, and convergence of the proposed algorithm. We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset.

View on arXiv PDF

Similar