LGSep 26, 2025

Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning

arXiv:2509.21942v12 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of stable and effective policy learning in offline RL with sparse rewards, offering a novel method that enhances adaptability and generalization, though it is incremental in the context of hierarchical diffusion approaches.

The paper tackles the problem of inflexible hierarchical diffusion in offline reinforcement learning for long-horizon tasks by proposing SIHD, which adaptively constructs diffusion hierarchies based on structural information, resulting in significant performance improvements over state-of-the-art baselines.

Diffusion-based generative methods have shown promising potential for modeling trajectories from offline reinforcement learning (RL) datasets, and hierarchical diffusion has been introduced to mitigate variance accumulation and computational challenges in long-horizon planning tasks. However, existing approaches typically assume a fixed two-layer diffusion hierarchy with a single predefined temporal scale, which limits adaptability to diverse downstream tasks and reduces flexibility in decision making. In this work, we propose SIHD, a novel Structural Information-based Hierarchical Diffusion framework for effective and stable offline policy learning in long-horizon environments with sparse rewards. Specifically, we analyze structural information embedded in offline trajectories to construct the diffusion hierarchy adaptively, enabling flexible trajectory modeling across multiple temporal scales. Rather than relying on reward predictions from localized sub-trajectories, we quantify the structural information gain of each state community and use it as a conditioning signal within the corresponding diffusion layer. To reduce overreliance on offline datasets, we introduce a structural entropy regularizer that encourages exploration of underrepresented states while avoiding extrapolation errors from distributional shifts. Extensive evaluations on challenging offline RL tasks show that SIHD significantly outperforms state-of-the-art baselines in decision-making performance and demonstrates superior generalization across diverse scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes