ROLGSep 26, 2023

Learning Multimodal Attention for Manipulating Deformable Objects with Changing States

arXiv:2309.14837v28 citationsh-index: 38
Originality Incremental advance
AI Analysis

This addresses the challenge for robots in handling deformable objects with dynamic states in real-world tasks like cooking, though it is incremental as it builds on existing attention and learning methods.

The paper tackled the problem of robots autonomously cooking scrambled eggs by perceiving the egg's changing states and adjusting stirring movements in real time, achieving a robot that could perform cooking with unknown ingredients and adapt its stirring method based on the egg's status without explicit instructions.

To support humans in their daily lives, robots are required to autonomously learn, adapt to objects and environments, and perform the appropriate actions. We tackled on the task of cooking scrambled eggs using real ingredients, in which the robot needs to perceive the states of the egg and adjust stirring movement in real time, while the egg is heated and the state changes continuously. In previous works, handling changing objects was found to be challenging because sensory information includes dynamical, both important or noisy information, and the modality which should be focused on changes every time, making it difficult to realize both perception and motion generation in real time. We propose a predictive recurrent neural network with an attention mechanism that can weigh the sensor input, distinguishing how important and reliable each modality is, that realize quick and efficient perception and motion generation. The model is trained with learning from the demonstration, and allows the robot to acquire human-like skills. We validated the proposed technique using the robot, Dry-AIREC, and with our learning model, it could perform cooking eggs with unknown ingredients. The robot could change the method of stirring and direction depending on the status of the egg, as in the beginning it stirs in the whole pot, then subsequently, after the egg started being heated, it starts flipping and splitting motion targeting specific areas, although we did not explicitly indicate them.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes