IRAIApr 17, 2023

Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning

arXiv:2304.07920v237 citationsh-index: 61
Originality Incremental advance
AI Analysis

This work addresses data inefficiency and reward design issues in recommender systems, offering a practical solution for large-scale applications.

The paper tackles the challenge of designing reward functions in reinforcement learning-based recommender systems by proposing CDT4Rec, an offline reinforcement learning model that uses transformers to capture causal relationships from datasets, achieving improved performance on six real-world datasets and an online simulator.

Reinforcement learning-based recommender systems have recently gained popularity. However, the design of the reward function, on which the agent relies to optimize its recommendation policy, is often not straightforward. Exploring the causality underlying users' behavior can take the place of the reward function in guiding the agent to capture the dynamic interests of users. Moreover, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in large-scale situations. Although some works attempt to convert the offline dataset into a simulator, data inefficiency makes the learning process even slower. Because of the nature of reinforcement learning (i.e., learning by interaction), it cannot collect enough data to train during a single interaction. Furthermore, traditional reinforcement learning algorithms do not have a solid capability like supervised learning methods to learn from offline datasets directly. In this paper, we propose a new model named the causal decision transformer for recommender systems (CDT4Rec). CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction. Moreover, CDT4Rec employs the transformer architecture, which is capable of processing large offline datasets and capturing both short-term and long-term dependencies within the data to estimate the causal relationship between action, state, and reward. To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes