Solving NP-Hard Problems on Graphs with Extended AlphaGo Zero
This work addresses combinatorial optimization problems for researchers and practitioners in machine learning and operations research, offering an incremental improvement over existing methods.
The paper tackles NP-hard combinatorial optimization problems on graphs by proposing a novel learning strategy based on AlphaGo Zero, redesigned for such problems where rewards are real numbers instead of binary outcomes. In experiments on five NP-hard problems like Minimum Vertex Cover and MaxCut, the method generalizes better to various graphs than the baseline S2V-DQN and can be combined with graph neural networks for improved performance.
There have been increasing challenges to solve combinatorial optimization problems by machine learning. Khalil et al. proposed an end-to-end reinforcement learning framework, S2V-DQN, which automatically learns graph embeddings to construct solutions to a wide range of problems. To improve the generalization ability of their Q-learning method, we propose a novel learning strategy based on AlphaGo Zero which is a Go engine that achieved a superhuman level without the domain knowledge of the game. Our framework is redesigned for combinatorial problems, where the final reward might take any real number instead of a binary response, win/lose. In experiments conducted for five kinds of NP-hard problems including {\sc MinimumVertexCover} and {\sc MaxCut}, our method is shown to generalize better to various graphs than S2V-DQN. Furthermore, our method can be combined with recently-developed graph neural network (GNN) models such as the \emph{Graph Isomorphism Network}, resulting in even better performance. This experiment also gives an interesting insight into a suitable choice of GNN models for each task.