LGRODec 19, 2024

AdaCred: Adaptive Causal Decision Transformers with Feature Crediting

arXiv:2412.15427v14 citationsh-index: 5AAMAS
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in offline reinforcement learning for improving efficiency and performance, though it appears incremental as it builds on existing sequence modeling approaches.

The paper tackles the problem of offline reinforcement learning models over-relying on memorizing long-term trajectories, which impairs their ability to attribute importance to task-relevant features. The result is that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in offline RL and imitation learning environments.

Reinforcement learning (RL) can be formulated as a sequence modeling problem, where models predict future actions based on historical state-action-reward sequences. Current approaches typically require long trajectory sequences to model the environment in offline RL settings. However, these models tend to over-rely on memorizing long-term representations, which impairs their ability to effectively attribute importance to trajectories and learned representations based on task-specific relevance. In this work, we introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our model adaptively learns control policy by crediting and pruning low-importance representations, retaining only those most relevant for the downstream task. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes