On the Difficulty of Generalizing Reinforcement Learning Framework for Combinatorial Optimization
This work addresses the generalization challenge of RL frameworks for combinatorial optimization, which is important for researchers and practitioners seeking scalable AI solutions, but it is incremental as it tests an existing approach on a new problem.
The paper investigates whether deep reinforcement learning models, which have been successful for graph-based combinatorial optimization problems like the travel salesman problem, can generalize to other classes such as the quadratic assignment problem (QAP), using security-aware phone clone allocation as a case study, and finds that existing models may not generalize effectively.
Combinatorial optimization problems (COPs) on the graph with real-life applications are canonical challenges in Computer Science. The difficulty of finding quality labels for problem instances holds back leveraging supervised learning across combinatorial problems. Reinforcement learning (RL) algorithms have recently been adopted to solve this challenge automatically. The underlying principle of this approach is to deploy a graph neural network (GNN) for encoding both the local information of the nodes and the graph-structured data in order to capture the current state of the environment. Then, it is followed by the actor to learn the problem-specific heuristics on its own and make an informed decision at each state for finally reaching a good solution. Recent studies on this subject mainly focus on a family of combinatorial problems on the graph, such as the travel salesman problem, where the proposed model aims to find an ordering of vertices that optimizes a given objective function. We use the security-aware phone clone allocation in the cloud as a classical quadratic assignment problem (QAP) to investigate whether or not deep RL-based model is generally applicable to solve other classes of such hard problems. Extensive empirical evaluation shows that existing RL-based model may not generalize to QAP.