Nonlocal Monte Carlo via Reinforcement Learning
This work addresses inefficiencies in conventional MCMC methods for combinatorial optimization near computational phase transitions, offering potential benefits for researchers and practitioners in fields like operations research and AI, though it is incremental as it builds on existing nonlocal algorithms.
The paper tackled the challenge of optimizing or sampling complex cost functions in combinatorial optimization by using deep reinforcement learning to train nonlocal transition policies for Nonequilibrium Nonlocal Monte Carlo algorithms, resulting in improved performance over standard methods on hard 4-SAT benchmarks in terms of residual energy, time-to-solution, and solution diversity.
Optimizing or sampling complex cost functions of combinatorial optimization problems is a longstanding challenge across disciplines and applications. When employing family of conventional algorithms based on Markov Chain Monte Carlo (MCMC) such as simulated annealing or parallel tempering, one assumes homogeneous (equilibrium) temperature profiles across input. This instance independent approach was shown to be ineffective for the hardest benchmarks near a computational phase transition when the so-called overlap-gap-property holds. In these regimes conventional MCMC struggles to unfreeze rigid variables, escape suboptimal basins of attraction, and sample high-quality and diverse solutions. In order to mitigate these challenges, Nonequilibrium Nonlocal Monte Carlo (NMC) algorithms were proposed that leverage inhomogeneous temperature profiles thereby accelerating exploration of the configuration space without compromising its exploitation. Here, we employ deep reinforcement learning (RL) to train the nonlocal transition policies of NMC which were previously designed phenomenologically. We demonstrate that the resulting solver can be trained solely by observing energy changes of the configuration space exploration as RL rewards and the local minimum energy landscape geometry as RL states. We further show that the trained policies improve upon the standard MCMC-based and nonlocal simulated annealing on hard uniform random and scale-free random 4-SAT benchmarks in terms of residual energy, time-to-solution, and diversity of solutions metrics.