AI CYMar 29, 2021

pH-RL: A personalization architecture to bring reinforcement learning to health practice

Ali el Hassouni, Mark Hoogendoorn, Marketa Ciharova, Annet Kleiboer, Khadicha Amarti, Vesa Muhonen, Heleen Riper, A. E. Eiben

arXiv:2103.15908v24.51 citationsh-index: 85Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of personalizing RL for health applications, such as mental health adherence, but it appears incremental as it builds on existing RL methods with a focus on deployment and integration.

The paper tackles the challenge of deploying reinforcement learning (RL) in real-life health scenarios by introducing pH-RL, a general architecture for personalization, and demonstrates its integration with a mental health mobile app, showing that the policies learn appropriate actions consistently with only a few days of data and remain stable during a study with human participants.

While reinforcement learning (RL) has proven to be the approach of choice for tackling many complex problems, it remains challenging to develop and deploy RL agents in real-life scenarios successfully. This paper presents pH-RL (personalization in e-Health with RL) a general RL architecture for personalization to bring RL to health practice. pH-RL allows for various levels of personalization in health applications and allows for online and batch learning. Furthermore, we provide a general-purpose implementation framework that can be integrated with various healthcare applications. We describe a step-by-step guideline for the successful deployment of RL policies in a mobile application. We implemented our open-source RL architecture and integrated it with the MoodBuster mobile application for mental health to provide messages to increase daily adherence to the online therapeutic modules. We then performed a comprehensive study with human participants over a sustained period. Our experimental results show that the developed policies learn to select appropriate actions consistently using only a few days' worth of data. Furthermore, we empirically demonstrate the stability of the learned policies during the study.

View on arXiv PDF

Similar