LGOct 2, 2025

Black-Box Combinatorial Optimization with Order-Invariant Reinforcement Learning

Olivier Goudet, Quentin Suire, Adrien Goëffon, Frédéric Saubion, Sylvain Lamprier

arXiv:2510.01824v17.11 citationsh-index: 18

Originality Highly original

AI Analysis

This addresses combinatorial optimization problems where classical methods are inefficient, representing a novel method for a known bottleneck.

The paper tackles black-box combinatorial optimization by introducing an order-invariant reinforcement learning framework that uses random generation orders during training to improve sample efficiency and search-space diversity. The method consistently achieves top performance and avoids catastrophic failures across various benchmark algorithms and problem sizes.

We introduce an order-invariant reinforcement learning framework for black-box combinatorial optimization. Classical estimation-of-distribution algorithms (EDAs) often rely on learning explicit variable dependency graphs, which can be costly and fail to capture complex interactions efficiently. In contrast, we parameterize a multivariate autoregressive generative model trained without a fixed variable ordering. By sampling random generation orders during training - a form of information-preserving dropout - the model is encouraged to be invariant to variable order, promoting search-space diversity and shaping the model to focus on the most relevant variable dependencies, improving sample efficiency. We adapt Generalized Reinforcement Policy Optimization (GRPO) to this setting, providing stable policy-gradient updates from scale-invariant advantages. Across a wide range of benchmark algorithms and problem instances of varying sizes, our method frequently achieves the best performance and consistently avoids catastrophic failures.

View on arXiv PDF

Similar