LGMay 16

Learning Multi-Timescale Abstractions for Hierarchical Combinatorial Planning

Vivienne Huiling Wang, Tinghuai Wang, Joni Pajarinen

arXiv:2605.170586.4

Predicted impact top 78% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For researchers working on reinforcement learning for combinatorial optimization with long horizons and stochastic dynamics, this method provides a principled way to handle variable-duration actions and resource allocation.

This work introduces a model-based hierarchical reinforcement learning framework for sequential stochastic combinatorial optimization, combining a latent-space tree-search planner with an SMDP-aware world model. The method achieves state-of-the-art performance across multiple challenging benchmarks, outperforming strong baselines.

The combination of exponentially large action spaces, stochastic dynamics, and long-horizon decision-making under limited resources makes Sequential Stochastic Combinatorial Optimization (SSCO) particularly challenging for reinforcement learning. Hierarchical Reinforcement Learning (HRL) offers a natural decomposition, but it places the high-level policy in a Semi-Markov Decision Process (SMDP) where actions have variable durations, making it difficult to learn a world model that is suitable for planning. We introduce a model-based hierarchical framework for sequential stochastic combinatorial decision-making that directly addresses this issue. Our method combines a latent-space tree-search planner with an SMDP-aware world model for variable-duration decisions. A multi-timescale objective structures the latent dynamics so that transition magnitudes reflect the effective temporal scales of abstract actions, enabling efficient lookahead under adaptive temporal abstraction. We further learn a subgoal-conditioned budget policy jointly with the world model to support context-aware resource allocation. Across challenging SSCO benchmarks, our method outperforms strong baselines.

View on arXiv PDF

Similar