Quantifying Generalisation in Imitation Learning
This provides a domain-specific tool for researchers in imitation learning to better assess generalisation, though it is incremental as it focuses on benchmarking rather than a new method.
The paper tackles the problem of limited variation in imitation learning benchmarks by introducing Labyrinth, a benchmarking environment that enables controlled testing of generalisation with distinct training and evaluation settings, resulting in a tool for reproducible experiments and more robust agents.
Imitation learning benchmarks often lack sufficient variation between training and evaluation, limiting meaningful generalisation assessment. We introduce Labyrinth, a benchmarking environment designed to test generalisation with precise control over structure, start and goal positions, and task complexity. It enables verifiably distinct training, evaluation, and test settings. Labyrinth provides a discrete, fully observable state space and known optimal actions, supporting interpretability and fine-grained evaluation. Its flexible setup allows targeted testing of generalisation factors and includes variants like partial observability, key-and-door tasks, and ice-floor hazards. By enabling controlled, reproducible experiments, Labyrinth advances the evaluation of generalisation in imitation learning and provides a valuable tool for developing more robust agents.