David Paulius

h-index49

17papers

434citations

Novelty39%

AI Score39

Ranked #104,205 of 201,326 authors (top 52%)#3,593 in RO (top 47%)

17 Papers

AINov 17, 2022

CAPE: Corrective Actions from Precondition Errors using Large Language Models

Shreyas Sundara Raman, Vanya Cohen, Ifrah Idrees et al.

Extracting commonsense knowledge from a large language model (LLM) offers a path to designing intelligent robots. Existing approaches that leverage LLMs for planning are unable to recover when an action fails and often resort to retrying failed actions, without resolving the error's underlying cause. We propose a novel approach (CAPE) that attempts to propose corrective actions to resolve precondition errors during planning. CAPE improves the quality of generated plans by leveraging few-shot reasoning from action preconditions. Our approach enables embodied agents to execute more tasks than baseline methods while ensuring semantic correctness and minimizing re-prompting. In VirtualHome, CAPE generates executable plans while improving a human-annotated plan correctness metric from 28.89% to 49.63% over SayCan. Our improvements transfer to a Boston Dynamics Spot robot initialized with a set of skills (specified in language) and associated preconditions, where CAPE improves the correctness metric of the executed task plans by 76.49% compared to SayCan. Our approach enables the robot to follow natural language commands and robustly recover from failures, which baseline approaches largely cannot resolve or address inefficiently.

ROJul 12, 2022

Long-Horizon Planning and Execution with Functional Object-Oriented Networks

David Paulius, Alejandro Agostini, Dongheui Lee

Following work on joint object-action representations, functional object-oriented networks (FOON) were introduced as a knowledge graph representation for robots. A FOON contains symbolic concepts useful to a robot's understanding of tasks and its environment for object-level planning. Prior to this work, little has been done to show how plans acquired from FOON can be executed by a robot, as the concepts in a FOON are too abstract for execution. We thereby introduce the idea of exploiting object-level knowledge as a FOON for task planning and execution. Our approach automatically transforms FOON into PDDL and leverages off-the-shelf planners, action contexts, and robot skills in a hierarchical planning pipeline to generate executable task plans. We demonstrate our entire approach on long-horizon tasks in CoppeliaSim and show how learned action contexts can be extended to never-before-seen scenarios.

ROApr 13

SkillWrapper: Generative Predicate Invention for Task-level Planning

Ziyi Yang, Benned Hedegaard, Ahmed Jaafar et al.

Generalizing from individual skill executions to solving long-horizon tasks remains a core challenge in building autonomous agents. A promising direction is learning high-level, symbolic abstractions of the low-level skills of the agents, enabling reasoning and planning independent of the low-level state space. Among possible high-level representations, object-centric skill abstraction with symbolic predicates has been proven to be efficient because of its compatibility with domain-independent planners. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs, a process we call generative predicate invention, to facilitate downstream abstraction learning. However, it remains unclear which formal properties the learned representations must satisfy, and how they can be learned to guarantee these properties. In this paper, we address both questions by presenting a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. Within this framework, we propose SkillWrapper, a method that leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable solving unseen, long-horizon tasks in the real world with black-box skills.

ROApr 5, 2022

Grounding of the Functional Object-Oriented Network in Industrial Tasks

Rafik Ayari, Matteo Pantano, David Paulius

In this preliminary work, we propose to design an activity recognition system that is suitable for Industrie 4.0 (I4.0) applications, especially focusing on Learning from Demonstration (LfD) in collaborative robot tasks. More precisely, we focus on the issue of data exchange between an activity recognition system and a collaborative robotic system. We propose an activity recognition system with linked data using functional object-oriented network (FOON) to facilitate industrial use cases. Initially, we drafted a FOON for our use case. Afterwards, an action is estimated by using object and hand recognition systems coupled with a recurrent neural network, which refers to FOON objects and states. Finally, the detected action is shared via a context broker using an existing linked data model, thus enabling the robotic system to interpret the action and execute it afterwards. Our initial results show that FOON can be used for an industrial use case and that we can use existing linked data models in LfD applications.

ROOct 18, 2024

Skill Generalization with Verbs

Rachel Ma, Lyndon Lam, Benjamin A. Spiegel et al.

It is imperative that robots can understand natural language commands issued by humans. Such commands typically contain verbs that signify what action should be performed on a given object and that are applicable to many objects. We propose a method for generalizing manipulation skills to novel objects using verbs. Our method learns a probabilistic classifier that determines whether a given object trajectory can be described by a specific verb. We show that this classifier accurately generalizes to novel object categories with an average accuracy of 76.69% across 13 object categories and 14 verbs. We then perform policy search over the object kinematics to find an object trajectory that maximizes classifier prediction for a given verb. Our method allows a robot to generate a trajectory for a novel object based on a verb, which can then be used as input to a motion planner. We show that our model can generate trajectories that are usable for executing five verb commands applied to novel instances of two different object categories on a real robot.

RODec 4, 2021

Functional Task Tree Generation from a Knowledge Graph to Solve Unseen Problems

Md. Sadman Sakib, David Paulius, Yu Sun

A major component for developing intelligent and autonomous robots is a suitable knowledge representation, from which a robot can acquire knowledge about its actions or world. However, unlike humans, robots cannot creatively adapt to novel scenarios, as their knowledge and environment are rigidly defined. To address the problem of producing novel and flexible task plans called task trees, we explore how we can derive plans with concepts not originally in the robot's knowledge base. Existing knowledge in the form of a knowledge graph is used as a base of reference to create task trees that are modified with new object or state combinations. To demonstrate the flexibility of our method, we randomly selected recipes from the Recipe1M+ dataset and generated their task trees. The task trees were then thoroughly checked with a visualization tool that portrays how each ingredient changes with each action to produce the desired meal. Our results indicate that the proposed method can produce task plans with high accuracy even for never-before-seen ingredient combinations.

ROJun 1, 2021

Evaluating Recipes Generated from Functional Object-Oriented Network

Md Sadman Sakib, Hailey Baez, David Paulius et al.

The functional object-oriented network (FOON) has been introduced as a knowledge representation, which takes the form of a graph, for symbolic task planning. To get a sequential plan for a manipulation task, a robot can obtain a task tree through a knowledge retrieval process from the FOON. To evaluate the quality of an acquired task tree, we compare it with a conventional form of task knowledge, such as recipes or manuals. We first automatically convert task trees to recipes, and we then compare them with the human-created recipes in the Recipe1M+ dataset via a survey. Our preliminary study finds no significant difference between the recipes in Recipe1M+ and the recipes generated from FOON task trees in terms of correctness, completeness, and clarity.

ROJun 1, 2021

A Road-map to Robot Task Execution with the Functional Object-Oriented Network

David Paulius, Alejandro Agostini, Yu Sun et al.

Following work on joint object-action representations, the functional object-oriented network (FOON) was introduced as a knowledge graph representation for robots. Taking the form of a bipartite graph, a FOON contains symbolic or high-level information that would be pertinent to a robot's understanding of its environment and tasks in a way that mirrors human understanding of actions. In this work, we outline a road-map for future development of FOON and its application in robotic systems for task planning as well as knowledge acquisition from demonstration. We propose preliminary ideas to show how a FOON can be created in a real-world scenario with a robot and human teacher in a way that can jointly augment existing knowledge in a FOON and teach a robot the skills it needs to replicate the demonstrated actions and solve a given manipulation problem.

CVDec 10, 2020

Developing Motion Code Embedding for Action Recognition in Videos

Maxat Alibayev, David Paulius, Yu Sun

In this work, we propose a motion embedding strategy known as motion codes, which is a vectorized representation of motions based on a manipulation's salient mechanical attributes. These motion codes provide a robust motion representation, and they are obtained using a hierarchy of features called the motion taxonomy. We developed and trained a deep neural network model that combines visual and semantic features to identify the features found in our motion taxonomy to embed or annotate videos with motion codes. To demonstrate the potential of motion codes as features for machine learning tasks, we integrated the extracted features from the motion embedding model into the current state-of-the-art action recognition model. The obtained model achieved higher accuracy than the baseline model for the verb classification task on egocentric videos from the EPIC-KITCHENS dataset.

ROJul 31, 2020

Estimating Motion Codes from Demonstration Videos

Maxat Alibayev, David Paulius, Yu Sun

A motion taxonomy can encode manipulations as a binary-encoded representation, which we refer to as motion codes. These motion codes innately represent a manipulation action in an embedded space that describes the motion's mechanical features, including contact and trajectory type. The key advantage of using motion codes for embedding is that motions can be more appropriately defined with robotic-relevant features, and their distances can be more reasonably measured using these motion features. In this paper, we develop a deep learning pipeline to extract motion codes from demonstration videos in an unsupervised manner so that knowledge from these videos can be properly represented and used for robots. Our evaluations show that motion codes can be extracted from demonstrations of action in the EPIC-KITCHENS dataset.

ROJul 13, 2020

A Motion Taxonomy for Manipulation Embedding

David Paulius, Nicholas Eales, Yu Sun

To represent motions from a mechanical point of view, this paper explores motion embedding using the motion taxonomy. With this taxonomy, manipulations can be described and represented as binary strings called motion codes. Motion codes capture mechanical properties, such as contact type and trajectory, that should be used to define suitable distance metrics between motions or loss functions for deep learning and reinforcement learning. Motion codes can also be used to consolidate aliases or cluster motion types that share similar properties. Using existing data sets as a reference, we discuss how motion codes can be created and assigned to actions that are commonly seen in activities of daily living based on intuition as well as real data. Motion codes are compared to vectors from pre-trained Word2Vec models, and we show that motion codes maintain distances that closely match the reality of manipulation.

ROOct 1, 2019

Manipulation Motion Taxonomy and Coding for Robots

David Paulius, Yongqiang Huang, Jason Meloncon et al.

This paper introduces a taxonomy of manipulations as seen especially in cooking for 1) grouping manipulations from the robotics point of view, 2) consolidating aliases and removing ambiguity for motion types, and 3) provide a path to transferring learned manipulations to new unlearned manipulations. Using instructional videos as a reference, we selected a list of common manipulation motions seen in cooking activities grouped into similar motions based on several trajectory and contact attributes. Manipulation codes are then developed based on the taxonomy attributes to represent the manipulation motions. The manipulation taxonomy is then used for comparing motion data in the Daily Interactive Manipulation (DIM) data set to reveal their motion similarities.

ROMay 1, 2019

Task Planning with a Weighted Functional Object-Oriented Network

David Paulius, Kelvin Sheng Pei Dong, Yu Sun

In reality, there is still much to be done for robots to be able to perform manipulation actions with full autonomy. Complicated manipulation tasks, such as cooking, may still require a person to perform some actions that are very risky for a robot to perform. On the other hand, some other actions may be very risky for a human with physical disabilities to perform. Therefore, it is necessary to balance the workload of a robot and a human based on their limitations while minimizing the effort needed from a human in a collaborative robot (cobot) set-up. This paper proposes a new version of our functional object-oriented network (FOON) that integrates weights in its functional units to reflect a robot's chance of successfully executing an action of that functional unit. The paper also presents a task planning algorithm for the weighted FOON to allocate manipulation action load to the robot and human to achieve optimal performance while minimizing human effort. Through a number of experiments, this paper shows several successful cases in which using the proposed weighted FOON and the task planning algorithm allow a robot and a human to successfully complete complicated tasks together with higher success rates than a robot doing them alone.

ROFeb 5, 2019

Functional Object-Oriented Network for Manipulation Learning

David Paulius, Yongqiang Huang, Roger Milton et al.

This paper presents a novel structured knowledge representation called the functional object-oriented network (FOON) to model the connectivity of the functional-related objects and their motions in manipulation tasks. The graphical model FOON is learned by observing object state change and human manipulations with the objects. Using a well-trained FOON, robots can decipher a task goal, seek the correct objects at the desired states on which to operate, and generate a sequence of proper manipulation motions. The paper describes FOON's structure and an approach to form a universal FOON with extracted knowledge from online instructional videos. A graph retrieval approach is presented to generate manipulation motion sequences from the FOON to achieve a desired goal, demonstrating the flexibility of FOON in creating a novel and adaptive means of solving a problem using knowledge gathered from multiple sources. The results are demonstrated in a simulated environment to illustrate the motion sequences generated from the FOON to carry out the desired tasks.

ROJul 5, 2018

A Survey of Knowledge Representation in Service Robotics

David Paulius, Yu Sun

Within the realm of service robotics, researchers have placed a great amount of effort into learning, understanding, and representing motions as manipulations for task execution by robots. The task of robot learning and problem-solving is very broad, as it integrates a variety of tasks such as object detection, activity recognition, task/motion planning, localization, knowledge representation and retrieval, and the intertwining of perception/vision and machine learning techniques. In this paper, we solely focus on knowledge representations and notably how knowledge is typically gathered, represented, and reproduced to solve problems as done by researchers in the past decades. In accordance with the definition of knowledge representations, we discuss the key distinction between such representations and useful learning models that have extensively been introduced and studied in recent years, such as machine learning, deep learning, probabilistic modelling, and semantic graphical structures. Along with an overview of such tools, we discuss the problems which have existed in robot learning and how they have been built and used as solutions, technologies or developments (if any) which have contributed to solving them. Finally, we discuss key principles that should be considered when designing an effective knowledge representation.

ROJul 5, 2018

Functional Object-Oriented Network: Construction & Expansion

David Paulius, Ahmad Babaeian Jelodar, Yu Sun

We build upon the functional object-oriented network (FOON), a structured knowledge representation which is constructed from observations of human activities and manipulations. A FOON can be used for representing object-motion affordances. Knowledge retrieval through graph search allows us to obtain novel manipulation sequences using knowledge spanning across many video sources, hence the novelty in our approach. However, we are limited to the sources collected. To further improve the performance of knowledge retrieval as a follow up to our previous work, we discuss generalizing knowledge to be applied to objects which are similar to what we have in FOON without manually annotating new sources of knowledge. We discuss two means of generalization: 1) expanding our network through the use of object similarity to create new functional units from those we already have, and 2) compressing the functional units by object categories rather than specific objects. We discuss experiments which compare the performance of our knowledge retrieval algorithm with both expansion and compression by categories.

CVJul 3, 2018

Long Activity Video Understanding using Functional Object-Oriented Network

Ahmad Babaeian Jelodar, David Paulius, Yu Sun

Video understanding is one of the most challenging topics in computer vision. In this paper, a four-stage video understanding pipeline is presented to simultaneously recognize all atomic actions and the single on-going activity in a video. This pipeline uses objects and motions from the video and a graph-based knowledge representation network as prior reference. Two deep networks are trained to identify objects and motions in each video sequence associated with an action. Low Level image features are then used to identify objects of interest in that video sequence. Confidence scores are assigned to objects of interest based on their involvement in the action and to motion classes based on results from a deep neural network that classifies the on-going action in video into motion classes. Confidence scores are computed for each candidate functional unit associated with an action using a knowledge representation network, object confidences, and motion confidences. Each action is therefore associated with a functional unit and the sequence of actions is further evaluated to identify the single on-going activity in the video. The knowledge representation used in the pipeline is called the functional object-oriented network which is a graph-based network useful for encoding knowledge about manipulation tasks. Experiments are performed on a dataset of cooking videos to test the proposed algorithm with action inference and activity classification. Experiments show that using functional object oriented network improves video understanding significantly.