CVSep 16, 2019

Bridging Visual Perception with Contextual Semantics for Understanding Robot Manipulation Tasks

arXiv:1909.07459v20.9

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of semantic interpretation for intelligent robots in manipulation scenarios, though it appears incremental as it builds on existing methods like Vision-Language models and ontologies.

The paper tackles the problem of enabling robots to understand manipulation tasks by generating high-level conceptual dynamic knowledge graphs from video clips, using a combination of Vision-Language models and ontologies to represent knowledge with E-R-E and E-A-V tuples, and demonstrates this in a kitchen environment case study.

Understanding manipulation scenarios allows intelligent robots to plan for appropriate actions to complete a manipulation task successfully. It is essential for intelligent robots to semantically interpret manipulation knowledge by describing entities, relations and attributes in a structural manner. In this paper, we propose an implementing framework to generate high-level conceptual dynamic knowledge graphs from video clips. A combination of a Vision-Language model and an ontology system, in correspondence with visual perception and contextual semantics, is used to represent robot manipulation knowledge with Entity-Relation-Entity (E-R-E) and Entity-Attribute-Value (E-A-V) tuples. The proposed method is flexible and well-versed. Using the framework, we present a case study where robot performs manipulation actions in a kitchen environment, bridging visual perception with contextual semantics using the generated dynamic knowledge graphs.

View on arXiv PDF

Similar