Imitation Learning Inputting Image Feature to Each Layer of Neural Network
This addresses a problem in robotics for imitation learning with multimodal data, but it is incremental as it builds on existing end-to-end approaches.
The paper tackles the challenge of imitation learning with multimodal data, where low-correlation inputs like images are often ignored, especially with short sampling periods. By inputting image features into each neural network layer, the method significantly improves success rates in pick-and-place experiments.
Imitation learning enables robots to learn and replicate human behavior from training data. Recent advances in machine learning enable end-to-end learning approaches that directly process high-dimensional observation data, such as images. However, these approaches face a critical challenge when processing data from multiple modalities, inadvertently ignoring data with a lower correlation to the desired output, especially when using short sampling periods. This paper presents a useful method to address this challenge, which amplifies the influence of data with a relatively low correlation to the output by inputting the data into each neural network layer. The proposed approach effectively incorporates diverse data sources into the learning process. Through experiments using a simple pick-and-place operation with raw images and joint information as input, significant improvements in success rates are demonstrated even when dealing with data from short sampling periods.