LGOct 2, 2025

Black-Box Combinatorial Optimization with Order-Invariant Reinforcement Learning

arXiv:2510.01824v11 citationsh-index: 18
Originality Highly original
AI Analysis

This addresses combinatorial optimization problems where classical methods are inefficient, representing a novel method for a known bottleneck.

The paper tackles black-box combinatorial optimization by introducing an order-invariant reinforcement learning framework that uses random generation orders during training to improve sample efficiency and search-space diversity. The method consistently achieves top performance and avoids catastrophic failures across various benchmark algorithms and problem sizes.

We introduce an order-invariant reinforcement learning framework for black-box combinatorial optimization. Classical estimation-of-distribution algorithms (EDAs) often rely on learning explicit variable dependency graphs, which can be costly and fail to capture complex interactions efficiently. In contrast, we parameterize a multivariate autoregressive generative model trained without a fixed variable ordering. By sampling random generation orders during training - a form of information-preserving dropout - the model is encouraged to be invariant to variable order, promoting search-space diversity and shaping the model to focus on the most relevant variable dependencies, improving sample efficiency. We adapt Generalized Reinforcement Policy Optimization (GRPO) to this setting, providing stable policy-gradient updates from scale-invariant advantages. Across a wide range of benchmark algorithms and problem instances of varying sizes, our method frequently achieves the best performance and consistently avoids catastrophic failures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes