Behavioral Cloning via Search in Video PreTraining Latent Space
This work addresses the challenge of creating autonomous agents for complex environments like Minecraft, but it is incremental as it applies existing methods to a specific dataset.
The paper tackles the problem of building autonomous agents for tasks in Minecraft by using imitation learning to search for and copy expert demonstrations in a latent space, resulting in an agent that shows human-like behavior.
Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.