AIJan 29, 2024

Zero-shot Imitation Policy via Search in Demonstration Dataset

arXiv:2401.16398v112 citationsh-index: 27ICASSP
Originality Incremental advance
AI Analysis

This addresses the policy adaptation problem for imitation learning agents in complex environments like Minecraft, though it is incremental as it builds on existing foundation models and search methods.

The paper tackles the problem of computationally expensive training and policy adaptation in behavioral cloning by proposing a search-based approach that uses latent spaces of pre-trained foundation models to index and copy behavior from similar demonstrations, achieving superior accuracy and perceptual evaluation compared to state-of-the-art imitation learning models in Minecraft.

Behavioral cloning uses a dataset of demonstrations to learn a policy. To overcome computationally expensive training procedures and address the policy adaptation problem, we propose to use latent spaces of pre-trained foundation models to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a dynamic search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video Pre-Training model. We compare our model to state-of-the-art, Imitation Learning-based Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach clearly wins in terms of accuracy and perceptual evaluation over learning-based models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes