LG OCJan 8

Imitation Learning for Combinatorial Optimisation under Uncertainty

Prakash Gawas, Antoine Legrain, Louis-Martin Rousseau

arXiv:2601.05383v3h-index: 13

Originality Incremental advance

AI Analysis

This work addresses a gap in imitation learning for combinatorial optimization under uncertainty, providing a taxonomy and framework that are incremental but useful for researchers and practitioners in optimization and machine learning.

The paper tackles the lack of a unifying framework for expert constructions in imitation learning for combinatorial optimization under uncertainty by introducing a systematic taxonomy and a generalized Dataset Aggregation framework. Results on a dynamic physician-to-patient assignment problem show that policies from stochastic experts outperform deterministic ones, and interactive learning improves solution quality with fewer demonstrations.

Imitation learning (IL) provides a data-driven framework for approximating policies for large-scale combinatorial optimisation problems formulated as sequential decision problems (SDPs), where exact solution methods are computationally intractable. A central but underexplored aspect of IL in this context is the role of the \emph{expert} that generates training demonstrations. Existing studies employ a wide range of expert constructions, yet lack a unifying framework to characterise their modelling assumptions, computational properties, and impact on learning performance. This paper introduces a systematic taxonomy of experts for imitation learning in combinatorial optimisation under uncertainty. The literature is classified along three principal dimensions: (i) treatment of uncertainty; (ii) level of optimality, distinguishing task-optimal and approximate experts; and (iii) interaction mode with the learner, ranging from one-shot supervision to iterative, interactive schemes. We further identify additional categories capturing other relevant expert characteristics. Building on this taxonomy, we propose a generalised Dataset Aggregation (DAgger) framework that accommodates multiple expert queries, expert aggregation, and flexible interaction strategies. The proposed framework is evaluated on a dynamic physician-to-patient assignment problem with stochastic arrivals and capacity constraints. Computational experiments compare learning outcomes across expert types and interaction regimes. The results show that policies learned from stochastic experts consistently outperform those learned from deterministic or full-information experts, while interactive learning improves solution quality using fewer expert demonstrations. Aggregated deterministic experts provide an effective alternative when stochastic optimisation becomes computationally challenging.

View on arXiv PDF

Similar