Learning to Search via Retrospective Imitation
This addresses the challenge of efficient combinatorial search for AI systems, offering a method that scales beyond conventional imitation learning, though it is incremental in its approach.
The paper tackles the problem of learning effective search policies in combinatorial spaces by introducing retrospective imitation learning, which improves a policy by learning from corrected versions of its own search traces, achieving scalability to larger problem sizes than those in initial expert demonstrations.
We study the problem of learning a good search policy for combinatorial search spaces. We propose retrospective imitation learning, which, after initial training by an expert, improves itself by learning from \textit{retrospective inspections} of its own roll-outs. That is, when the policy eventually reaches a feasible solution in a combinatorial search tree after making mistakes and backtracks, it retrospectively constructs an improved search trace to the solution by removing backtracks, which is then used to further train the policy. A key feature of our approach is that it can iteratively scale up, or transfer, to larger problem sizes than those solved by the initial expert demonstrations, thus dramatically expanding its applicability beyond that of conventional imitation learning. We showcase the effectiveness of our approach on a range of tasks, including synthetic maze solving and combinatorial problems expressed as integer programs.