Generative Inverse Deep Reinforcement Learning for Online Recommendation
This work addresses the challenge of designing accurate reward functions for recommendation systems, which is crucial for generating satisfactory recommendations, though it appears incremental as it builds on existing inverse reinforcement learning techniques.
The paper tackles the problem of manually designed reward functions in deep reinforcement learning for online recommendation, which are unrealistic or imprecise, by proposing InvRec, a generative inverse reinforcement learning approach that automatically extracts reward functions from user behaviors, achieving improved performance on the VirtualTB platform compared to state-of-the-art methods.
Deep reinforcement learning enables an agent to capture user's interest through interactions with the environment dynamically. It has attracted great interest in the recommendation research. Deep reinforcement learning uses a reward function to learn user's interest and to control the learning process. However, most reward functions are manually designed; they are either unrealistic or imprecise to reflect the high variety, dimensionality, and non-linearity properties of the recommendation problem. That makes it difficult for the agent to learn an optimal policy to generate the most satisfactory recommendations. To address the above issue, we propose a novel generative inverse reinforcement learning approach, namely InvRec, which extracts the reward function from user's behaviors automatically, for online recommendation. We conduct experiments on an online platform, VirtualTB, and compare with several state-of-the-art methods to demonstrate the feasibility and effectiveness of our proposed approach.