Augmented Behavioral Cloning from Observation
This work addresses the challenge of improving imitation learning for agents in environments where only expert state observations are available, representing an incremental advancement over prior methods.
The paper tackles the problem of imitation from observation, where agents mimic expert behavior from state sequences, by addressing sub-optimal solutions that limit effectiveness; it introduces a novel approach with a self-attention mechanism and sampling strategy, empirically outperforming state-of-the-art methods by a large margin in four environments.
Imitation from observation is a computational technique that teaches an agent on how to mimic the behavior of an expert by observing only the sequence of states from the expert demonstrations. Recent approaches learn the inverse dynamics of the environment and an imitation policy by interleaving epochs of both models while changing the demonstration data. However, such approaches often get stuck into sub-optimal solutions that are distant from the expert, limiting their imitation effectiveness. We address this problem with a novel approach that overcomes the problem of reaching bad local minima by exploring: (I) a self-attention mechanism that better captures global features of the states; and (ii) a sampling strategy that regulates the observations that are used for learning. We show empirically that our approach outperforms the state-of-the-art approaches in four different environments by a large margin.