RO LGJul 16, 2024

Learning secondary tool affordances of human partners using iCub robot's egocentric data

Bosong Ding, Erhan Oztop, Giacomo Spigler, Murat Kirtay

arXiv:2407.11922v12.2h-index: 9

Originality Synthesis-oriented

AI Analysis

This work addresses the incremental problem of improving human-robot collaboration by enabling robots to understand auxiliary tool uses, which is domain-specific to robotics and human-robot interaction.

The paper tackled the problem of learning secondary tool affordances from human partners using an iCub robot's egocentric data, and the result showed that deep learning architectures enabled the robot to predict these affordances, as demonstrated by training ResNet models on tasks involving tool and action prediction.

Objects, in particular tools, provide several action possibilities to the agents that can act on them, which are generally associated with the term of affordances. A tool is typically designed for a specific purpose, such as driving a nail in the case of a hammer, which we call as the primary affordance. A tool can also be used beyond its primary purpose, in which case we can associate this auxiliary use with the term secondary affordance. Previous work on affordance perception and learning has been mostly focused on primary affordances. Here, we address the less explored problem of learning the secondary tool affordances of human partners. To do this, we use the iCub robot to observe human partners with three cameras while they perform actions on twenty objects using four different tools. In our experiments, human partners utilize tools to perform actions that do not correspond to their primary affordances. For example, the iCub robot observes a human partner using a ruler for pushing, pulling, and moving objects instead of measuring their lengths. In this setting, we constructed a dataset by taking images of objects before and after each action is executed. We then model learning secondary affordances by training three neural networks (ResNet-18, ResNet-50, and ResNet-101) each on three tasks, using raw images showing the `initial' and `final' position of objects as input: (1) predicting the tool used to move an object, (2) predicting the tool used with an additional categorical input that encoded the action performed, and (3) joint prediction of both tool used and action performed. Our results indicate that deep learning architectures enable the iCub robot to predict secondary tool affordances, thereby paving the road for human-robot collaborative object manipulation involving complex affordances.

View on arXiv PDF

Similar