LG AIApr 6, 2024

Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

Tianle Pu, Changjun Fan, Mutian Shen, Yizhou Lu, Li Zeng, Zohar Nussinov, Chao Chen, Zhong Liu

arXiv:2404.04661v12.6h-index: 41

Originality Incremental advance

AI Analysis

This addresses the problem of inadequate exploration in RL for combinatorial optimization, offering a general and easy-to-implement solution for researchers and practitioners, though it is incremental as it builds on existing RL frameworks.

The paper tackles the limitation of finite-horizon MDP-based RL models in combinatorial optimization, which restrict exploration at test time, by proposing a simple gauge transformation technique that enables continuous solution improvement, achieving state-of-the-art performance on the MaxCut problem.

Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not allowed to explore adquately for improving solutions at test time, which may be necessary given the complexity of NP-hard optimization tasks. Some recent attempts solve this issue by focusing on reward design and state feature engineering, which are tedious and ad-hoc. In this work, we instead propose a much simpler but more effective technique, named gauge transformation (GT). The technique is originated from physics, but is very effective in enabling RL agents to explore to continuously improve the solutions during test. Morever, GT is very simple, which can be implemented with less than 10 lines of Python codes, and can be applied to a vast majority of RL models. Experimentally, we show that traditional RL models with GT technique produce the state-of-the-art performances on the MaxCut problem. Furthermore, since GT is independent of any RL models, it can be seamlessly integrated into various RL frameworks, paving the way of these models for more effective explorations in the solving of general COPs.

View on arXiv PDF

Similar