Sonia Chernova

RO
h-index49
51papers
1,804citations
Novelty47%
AI Score44

51 Papers

RONov 8, 2022
StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

Weiyu Liu, Yilun Du, Tucker Hermans et al. · mit, nvidia

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.

RONov 28, 2022
Proactive Robot Assistance via Spatio-Temporal Object Modeling

Maithili Patel, Sonia Chernova

Proactive robot assistance enables a robot to anticipate and provide for a user's needs without being explicitly asked. We formulate proactive assistance as the problem of the robot anticipating temporal patterns of object movements associated with everyday user routines, and proactively assisting the user by placing objects to adapt the environment to their needs. We introduce a generative graph neural network to learn a unified spatio-temporal predictive model of object dynamics from temporal sequences of object arrangements. We additionally contribute the Household Object Movements from Everyday Routines (HOMER) dataset, which tracks household objects associated with human activities of daily living across 50+ days for five simulated households. Our model outperforms the leading baseline in predicting object movement, correctly predicting locations for 11.1% more objects and wrongly predicting locations for 11.5% fewer objects used by the human user.

LGSep 21, 2023
State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding

Devleena Das, Sonia Chernova, Been Kim

As more non-AI experts use complex AI systems for daily tasks, there has been an increasing effort to develop methods that produce explanations of AI decision making that are understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classification techniques, and we posit that the few existing methods for sequential decision making are limited in scope. In this work, we first contribute a desiderata for defining concepts in sequential decision making settings. Additionally, inspired by the Protege Effect which states explaining knowledge often reinforces one's self-learning, we explore how concept-based explanations of an RL agent's decision making can in turn improve the agent's learning rate, as well as improve end-user understanding of the agent's decision making. To this end, we contribute a unified framework, State2Explanation (S2E), that involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging such learned model to both (1) inform reward shaping during an agent's training, and (2) provide explanations to end-users at deployment for improved task performance. Our experimental validations, in Connect 4 and Lunar Lander, demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end user task performance at deployment time.

AIApr 21, 2022
Creative Problem Solving in Artificially Intelligent Agents: A Survey and Framework

Evana Gizzi, Lakshmi Nair, Sonia Chernova et al.

Creative Problem Solving (CPS) is a sub-area within Artificial Intelligence (AI) that focuses on methods for solving off-nominal, or anomalous problems in autonomous systems. Despite many advancements in planning and learning, resolving novel problems or adapting existing knowledge to a new context, especially in cases where the environment may change in unpredictable ways post deployment, remains a limiting factor in the safe and useful integration of intelligent systems. The emergence of increasingly autonomous systems dictates the necessity for AI agents to deal with environmental uncertainty through creativity. To stimulate further research in CPS, we present a definition and a framework of CPS, which we adopt to categorize existing AI methods in this field. Our framework consists of four main components of a CPS problem, namely, 1) problem formulation, 2) knowledge representation, 3) method of knowledge manipulation, and 4) method of evaluation. We conclude our survey with open research questions, and suggested directions for the future.

AIMay 4, 2022
Explainable Knowledge Graph Embedding: Inference Reconciliation for Knowledge Inferences Supporting Robot Actions

Angel Daruna, Devleena Das, Sonia Chernova

Learned knowledge graph representations supporting robots contain a wealth of domain knowledge that drives robot behavior. However, there does not exist an inference reconciliation framework that expresses how a knowledge graph representation affects a robot's sequential decision making. We use a pedagogical approach to explain the inferences of a learned, black-box knowledge graph representation, a knowledge graph embedding. Our interpretable model, uses a decision tree classifier to locally approximate the predictions of the black-box model, and provides natural language explanations interpretable by non-experts. Results from our algorithmic evaluation affirm our model design choices, and the results of our user studies with non-experts support the need for the proposed inference reconciliation framework. Critically, results from our simulated robot evaluation indicate that our explanations enable non-experts to correct erratic robot behaviors due to nonsensical beliefs within the black-box.

ROSep 24, 2019Code
CAGE: Context-Aware Grasping Engine

Weiyu Liu, Angel Daruna, Sonia Chernova

Semantic grasping is the problem of selecting stable grasps that are functionally suitable for specific object manipulation tasks. In order for robots to effectively perform object manipulation, a broad sense of contexts, including object and task constraints, needs to be accounted for. We introduce the Context-Aware Grasping Engine, which combines a novel semantic representation of grasp contexts with a neural network structure based on the Wide & Deep model, capable of capturing complex reasoning patterns. We quantitatively validate our approach against three prior methods on a novel dataset consisting of 14,000 semantic grasps for 44 objects, 7 tasks, and 6 different object states. Our approach outperformed all baselines by statistically significant margins, producing new insights into the importance of balancing memorization and generalization of contexts for semantic grasping. We further demonstrate the effectiveness of our approach on robot experiments in which the presented model successfully achieved 31 of 32 suitable grasps. The code and data are available at: https://github.com/wliu88/rail_semantic_grasping

30.2ROMay 8
Many-to-Many Multi-Agent Pickup and Delivery

Ethan Schneider, Jingkai Chen, Tianyi Gu et al.

Multi-robot systems in automated warehouses must manage continuous streams of pickup-and-delivery tasks while ensuring efficiency and safety. Prior work on Multi-Agent Pickup-and-Delivery (MAPD) has largely focused on the one-to-one variant, where each task has a fixed pickup and delivery location. In contrast, real warehouses often present many-to-many MAPD scenarios, where items, tracked by stock keeping unit (SKU) identifiers, can be retrieved from or stored at multiple locations, resulting in an NP-hard four-dimensional assignment problem. To solve the many-to-many MAPD problem, we contribute our algorithm: Many-to-Many Multi-Agent Pickup and Delivery (M2M). We experiment with two variants of our algorithm: one that minimizes estimated task durations (M2M), and one which incorporates SKU distribution into the objective function (M2M-wSKU). Simulation results over 8-hour warehouse operations show that our method consistently matches or outperforms prior state of the art, with M2M completing up to 22,000 more tasks on average across different environments and warehouse inventory densities.

ROOct 25, 2024
Robot Behavior Personalization from Sparse User Feedback

Maithili Patel, Sonia Chernova

As service robots become more general-purpose, they will need to adapt to their users' preferences over a large set of all possible tasks that they can perform. This includes preferences regarding which actions the users prefer to delegate to robots as opposed to doing themselves. Existing personalization approaches require task-specific data for each user. To handle diversity across all household tasks and users, and nuances in user preferences across tasks, we propose to learn a task adaptation function independently, which can be used in tandem with any universal robot policy to customize robot behavior. We create Task Adaptation using Abstract Concepts (TAACo) framework. TAACo can learn to predict the user's preferred manner of assistance with any given task, by mediating reasoning through a representation composed of abstract concepts built based on user feedback. TAACo can generalize to an open set of household tasks from small amount of user feedback and explain its inferences through intuitive concepts. We evaluate our model on a dataset we collected of 5 people's preferences, and show that TAACo outperforms GPT-4 by 16% and a rule-based system by 54%, on prediction accuracy, with 40 samples of user feedback.

AIApr 5, 2025
ADAPT: Actively Discovering and Adapting to Preferences for any Task

Maithili Patel, Xavier Puig, Ruta Desai et al.

Assistive agents should be able to perform under-specified long-horizon tasks while respecting user preferences. We introduce Actively Discovering and Adapting to Preferences for any Task (ADAPT) -- a benchmark designed to evaluate agents' ability to adhere to user preferences across various household tasks through active questioning. Next, we propose Reflection-DPO, a novel training approach for adapting large language models (LLMs) to the task of active questioning. Reflection-DPO finetunes a 'student' LLM to follow the actions of a privileged 'teacher' LLM, and optionally ask a question to gather necessary information to better predict the teacher action. We find that prior approaches that use state-of-the-art LLMs fail to sufficiently follow user preferences in ADAPT due to insufficient questioning and poor adherence to elicited preferences. In contrast, Reflection-DPO achieves a higher rate of satisfying user preferences, outperforming a zero-shot chain-of-thought baseline by 6.1% on unseen users.

AIApr 28, 2025
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Mouad Abrini, Omri Abend, Dina Acklin et al. · cambridge

This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.

HCFeb 11, 2025
DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes

Alexander Karpekov, Sonia Chernova, Thomas Plötz

Human Activity Recognition (HAR) using ambient sensors has great potential for practical applications, particularly in elder care and independent living. However, deploying HAR systems in real-world settings remains challenging due to the high cost of labeled data, the need for pre-segmented sensor streams, and the lack of flexibility in activity granularity. To address these limitations, we introduce DISCOVER, a method designed to discover fine-grained human sub-activities from unlabeled sensor data without relying on pre-segmentation. DISCOVER combines unsupervised feature extraction and clustering with a user-friendly visualization tool to streamline the labeling process. DISCOVER enables domain experts to efficiently annotate only a minimal set of representative cluster centroids, reducing the annotation workload to a small number of samples (0.05% of our dataset). We demonstrate DISCOVER's effectiveness through a re-annotation exercise on widely used HAR datasets, showing that it uncovers finer-grained activities and produces more nuanced annotations than traditional coarse labels. DISCOVER represents a step toward practical, deployable HAR systems that adapt to diverse real environments.

AIJan 11, 2022
Subgoal-Based Explanations for Unreliable Intelligent Decision Support Systems

Devleena Das, Been Kim, Sonia Chernova

Intelligent decision support (IDS) systems leverage artificial intelligence techniques to generate recommendations that guide human users through the decision making phases of a task. However, a key challenge is that IDS systems are not perfect, and in complex real-world scenarios may produce incorrect output or fail to work altogether. The field of explainable AI planning (XAIP) has sought to develop techniques that make the decision making of sequential decision making AI systems more explainable to end-users. Critically, prior work in applying XAIP techniques to IDS systems has assumed that the plan being proposed by the planner is always optimal, and therefore the action or plan being recommended as decision support to the user is always correct. In this work, we examine novice user interactions with a non-robust IDS system -- one that occasionally recommends the wrong action, and one that may become unavailable after users have become accustomed to its guidance. We introduce a novel explanation type, subgoal-based explanations, for planning-based IDS systems, that supplements traditional IDS output with information about the subgoal toward which the recommended action would contribute. We demonstrate that subgoal-based explanations lead to improved user task performance, improve user ability to distinguish optimal and suboptimal IDS recommendations, are preferred by users, and enable more robust user performance in the case of IDS failure

CVJan 6, 2022
Incremental Object Grounding Using Scene Graphs

John Seon Keun Yi, Yoonwoo Kim, Sonia Chernova

Object grounding tasks aim to locate the target object in an image through verbal communications. Understanding human command is an important process needed for effective human-robot communication. However, this is challenging because human commands can be ambiguous and erroneous. This paper aims to disambiguate the human's referring expressions by allowing the agent to ask relevant questions based on semantic data obtained from scene graphs. We test if our agent can use relations between objects from a scene graph to ask semantically relevant questions that can disambiguate the original user command. In this paper, we present Incremental Grounding using Scene Graphs (IGSG), a disambiguation model that uses semantic data from an image scene graph and linguistic structures from a language scene graph to ground objects based on human command. Compared to the baseline, IGSG shows promising results in complex real-world scenes where there are multiple identical target objects. IGSG can effectively disambiguate ambiguous or wrong referring expressions by asking disambiguating questions back to the user.

ROAug 8, 2021
Semantic-Based Explainable AI: Leveraging Semantic Scene Graphs and Pairwise Ranking to Explain Robot Failures

Devleena Das, Sonia Chernova

When interacting in unstructured human environments, occasional robot failures are inevitable. When such failures occur, everyday people, rather than trained technicians, will be the first to respond. Existing natural language explanations hand-annotate contextual information from an environment to help everyday people understand robot failures. However, this methodology lacks generalizability and scalability. In our work, we introduce a more generalizable semantic explanation framework. Our framework autonomously captures the semantic information in a scene to produce semantically descriptive explanations for everyday users. To generate failure-focused explanations that are semantically grounded, we leverages both semantic scene graphs to extract spatial relations and object attributes from an environment, as well as pairwise ranking. Our results show that these semantically descriptive explanations significantly improve everyday users' ability to both identify failures and provide assistance for recovery than the existing state-of-the-art context-based explanations.

ROAug 5, 2021
An Interleaved Approach to Trait-Based Task Allocation and Scheduling

Glen Neville, Andrew Messing, Harish Ravichandar et al.

To realize effective heterogeneous multi-robot teams, researchers must leverage individual robots' relative strengths and coordinate their individual behaviors. Specifically, heterogeneous multi-robot systems must answer three important questions: \textit{who} (task allocation), \textit{when} (scheduling), and \textit{how} (motion planning). While specific variants of each of these problems are known to be NP-Hard, their interdependence only exacerbates the challenges involved in solving them together. In this paper, we present a novel framework that interleaves task allocation, scheduling, and motion planning. We introduce a search-based approach for trait-based time-extended task allocation named Incremental Task Allocation Graph Search (ITAGS). In contrast to approaches that solve the three problems in sequence, ITAGS's interleaved approach enables efficient search for allocations while simultaneously satisfying scheduling constraints and accounting for the time taken to execute motion plans. To enable effective interleaving, we develop a convex combination of two search heuristics that optimizes the satisfaction of task requirements as well as the makespan of the associated schedule. We demonstrate the efficacy of ITAGS using detailed ablation studies and comparisons against two state-of-the-art algorithms in a simulated emergency response domain.

ROAug 1, 2021
Desperate Times Call for Desperate Measures: Towards Risk-Adaptive Task Allocation

Max Rudolph, Sonia Chernova, Harish Ravichandar

Multi-robot task allocation (MRTA) problems involve optimizing the allocation of robots to tasks. MRTA problems are known to be challenging when tasks require multiple robots and the team is composed of heterogeneous robots. These challenges are further exacerbated when we need to account for uncertainties encountered in the real-world. In this work, we address coalition formation in heterogeneous multi-robot teams with uncertain capabilities. We specifically focus on tasks that require coalitions to collectively satisfy certain minimum requirements. Existing approaches to uncertainty-aware task allocation either maximize expected pay-off (risk-neutral approaches) or improve worst-case or near-worst-case outcomes (risk-averse approaches). Within the context of our problem, we demonstrate the inherent limitations of unilaterally ignoring or avoiding risk and show that these approaches can in fact reduce the probability of satisfying task requirements. Inspired by models that explain foraging behaviors in animals, we develop a risk-adaptive approach to task allocation. Our approach adaptively switches between risk-averse and risk-seeking behavior in order to maximize the probability of satisfying task requirements. Comprehensive numerical experiments conclusively demonstrate that our risk-adaptive approach outperforms risk-neutral and risk-averse approaches. We also demonstrate the effectiveness of our approach using a simulated multi-robot emergency response scenario.

LGMay 20, 2021
Explainable Activity Recognition for Smart Home Systems

Devleena Das, Yasutaka Nishimura, Rajan P. Vivek et al.

Smart home environments are designed to provide services that help improve the quality of life for the occupant via a variety of sensors and actuators installed throughout the space. Many automated actions taken by a smart home are governed by the output of an underlying activity recognition system. However, activity recognition systems may not be perfectly accurate and therefore inconsistencies in smart home operations can lead users reliant on smart home predictions to wonder "why did the smart home do that?" In this work, we build on insights from Explainable Artificial Intelligence (XAI) techniques and introduce an explainable activity recognition framework in which we leverage leading XAI methods to generate natural language explanations that explain what about an activity led to the given classification. Within the context of remote caregiver monitoring, we perform a two-step evaluation: (a) utilize ML experts to assess the sensibility of explanations, and (b) recruit non-experts in two user remote caregiver monitoring scenarios, synchronous and asynchronous, to assess the effectiveness of explanations generated via our framework. Our results show that the XAI approach, SHAP, has a 92% success rate in generating sensible explanations. Moreover, in 83% of sampled scenarios users preferred natural language explanations over a simple activity label, underscoring the need for explainable activity recognition systems. Finally, we show that explanations generated by some XAI methods can lead users to lose confidence in the accuracy of the underlying activity recognition model. We make a recommendation regarding which existing XAI method leads to the best performance in the domain of smart home automation, and discuss a range of topics for future work to further improve explainable activity recognition.

ROMay 10, 2021
Towards Robust One-shot Task Execution using Knowledge Graph Embeddings

Angel Daruna, Lakshmi Nair, Weiyu Liu et al.

Requiring multiple demonstrations of a task plan presents a burden to end-users of robots. However, robustly executing tasks plans from a single end-user demonstration is an ongoing challenge in robotics. We address the problem of one-shot task execution, in which a robot must generalize a single demonstration or prototypical example of a task plan to a new execution environment. Our approach integrates task plans with domain knowledge to infer task plan constituents for new execution environments. Our experimental evaluations show that our knowledge representation makes more relevant generalizations that result in significantly higher success rates over tested baselines. We validated the approach on a physical platform, which resulted in the successful generalization of initial task plans to 38 of 50 execution environments with errors resulting from autonomous robot operation included.

ROJan 14, 2021
Continual Learning of Knowledge Graph Embeddings

Angel Daruna, Mehul Gupta, Mohan Sridharan et al.

In recent years, there has been a resurgence in methods that use distributed (neural) representations to represent and reason about semantic knowledge for robotics applications. However, while robots often observe previously unknown concepts, these representations typically assume that all concepts are known a priori, and incorporating new information requires all concepts to be learned afresh. Our work relaxes this limiting assumption of existing representations and tackles the incremental knowledge graph embedding problem by leveraging the principles of a range of continual learning methods. Through an experimental evaluation with several knowledge graphs and embedding representations, we provide insights about trade-offs for practitioners to match a semantics-driven robotics applications to a suitable continual knowledge graph embedding method.

AIJan 5, 2021
Explainable AI for Robot Failures: Generating Explanations that Improve User Assistance in Fault Recovery

Devleena Das, Siddhartha Banerjee, Sonia Chernova

With the growing capabilities of intelligent systems, the integration of robots in our everyday life is increasing. However, when interacting in such complex human environments, the occasional failure of robotic systems is inevitable. The field of explainable AI has sought to make complex-decision making systems more interpretable but most existing techniques target domain experts. On the contrary, in many failure cases, robots will require recovery assistance from non-expert users. In this work, we introduce a new type of explanation, that explains the cause of an unexpected failure during an agent's plan execution to non-experts. In order for error explanations to be meaningful, we investigate what types of information within a set of hand-scripted explanations are most helpful to non-experts for failure and solution identification. Additionally, we investigate how such explanations can be autonomously generated, extending an existing encoder-decoder model, and generalized across environments. We investigate such questions in the context of a robot performing a pick-and-place manipulation task in the home environment. Our results show that explanations capturing the context of a failure and history of past actions, are the most effective for failure and solution identification among non-experts. Furthermore, through a second user evaluation, we verify that our model-generated explanations can generalize to an unseen office environment, and are just as effective as the hand-scripted explanations.

RODec 24, 2020
Towards Coordinated Robot Motions: End-to-End Learning of Motion Policies on Transform Trees

M. Asif Rana, Anqi Li, Dieter Fox et al.

Generating robot motion that fulfills multiple tasks simultaneously is challenging due to the geometric constraints imposed by the robot. In this paper, we propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. The policy structure provides the user an interface to 1) specifying the spaces that are directly relevant to the completion of the tasks, and 2) designing policies for certain tasks that do not need to be learned. We derive an end-to-end learning objective function that is suitable for the multi-task problem, emphasizing the deviation of motions on task spaces. Furthermore, the motion generated from the learned policy class is guaranteed to be stable. We validate the effectiveness of our proposed learning framework through qualitative and quantitative evaluations on three robotic tasks on a 7-DOF Rethink Sawyer robot.

RONov 24, 2020
Bi-directional Domain Adaptation for Sim2Real Transfer of Embodied Navigation Agents

Joanne Truong, Sonia Chernova, Dhruv Batra

Deep reinforcement learning models are notoriously data hungry, yet real-world data is expensive and time consuming to obtain. The solution that many have turned to is to use simulation for training before deploying the robot in a real environment. Simulation offers the ability to train large numbers of robots in parallel, and offers an abundance of data. However, no simulation is perfect, and robots trained solely in simulation fail to generalize to the real-world, resulting in a "sim-vs-real gap". How can we overcome the trade-off between the abundance of less accurate, artificial data from simulators and the scarcity of reliable, real-world data? In this paper, we propose Bi-directional Domain Adaptation (BDA), a novel approach to bridge the sim-vs-real gap in both directions -- real2sim to bridge the visual domain gap, and sim2real to bridge the dynamics domain gap. We demonstrate the benefits of BDA on the task of PointGoal Navigation. BDA with only 5k real-world (state, action, next-state) samples matches the performance of a policy fine-tuned with ~600k samples, resulting in a speed-up of ~120x.

RONov 24, 2020
Learning Navigation Skills for Legged Robots with Learned Robot Embeddings

Joanne Truong, Denis Yarats, Tianyu Li et al.

Recent work has shown results on learning navigation policies for idealized cylinder agents in simulation and transferring them to real wheeled robots. Deploying such navigation policies on legged robots can be challenging due to their complex dynamics, and the large dynamical difference between cylinder agents and legged systems. In this work, we learn hierarchical navigation policies that account for the low-level dynamics of legged robots, such as maximum speed, slipping, contacts, and learn to successfully navigate cluttered indoor environments. To enable transfer of policies learned in simulation to new legged robots and hardware, we learn dynamics-aware navigation policies across multiple robots with robot-specific embeddings. The learned embedding is optimized on new robots, while the rest of the policy is kept fixed, allowing for quick adaptation. We train our policies across three legged robots in simulation - 2 quadrupeds (A1, AlienGo) and a hexapod (Daisy). At test time, we study the performance of our learned policy on two new legged robots in simulation (Laikago, 4-legged Daisy), and one real-world quadrupedal robot (A1). Our experiments show that our learned policy can sample-efficiently generalize to previously unseen robots, and enable sim-to-real transfer of navigation policies for legged robots.

AINov 18, 2020
Explainable AI for System Failures: Generating Explanations that Improve Human Assistance in Fault Recovery

Devleena Das, Siddhartha Banerjee, Sonia Chernova

With the growing capabilities of intelligent systems, the integration of artificial intelligence (AI) and robots in everyday life is increasing. However, when interacting in such complex human environments, the failure of intelligent systems, such as robots, can be inevitable, requiring recovery assistance from users. In this work, we develop automated, natural language explanations for failures encountered during an AI agents' plan execution. These explanations are developed with a focus of helping non-expert users understand different point of failures to better provide recovery assistance. Specifically, we introduce a context-based information type for explanations that can both help non-expert users understand the underlying cause of a system failure, and select proper failure recoveries. Additionally, we extend an existing sequence-to-sequence methodology to automatically generate our context-based explanations. By doing so, we are able develop a model that can generalize context-based explanations over both different failure types and failure scenarios.

RONov 12, 2020
Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping

Adithyavairavan Murali, Weiyu Liu, Kenneth Marino et al.

Despite the enormous progress and generalization in robotic grasping in recent years, existing methods have yet to scale and generalize task-oriented grasping to the same extent. This is largely due to the scale of the datasets both in terms of the number of objects and tasks studied. We address these concerns with the TaskGrasp dataset which is more diverse both in terms of objects and tasks, and an order of magnitude larger than previous datasets. The dataset contains 250K task-oriented grasps for 56 tasks and 191 objects along with their RGB-D information. We take advantage of this new breadth and diversity in the data and present the GCNGrasp framework which uses the semantic knowledge of objects and tasks encoded in a knowledge graph to generalize to new object instances, classes and even new tasks. Our framework shows a significant improvement of around 12% on held-out settings compared to baseline methods which do not use semantics. We demonstrate that our dataset and model are applicable for the real world by executing task-oriented grasps on a real robot on unknown objects. Code, data and supplementary video could be found at https://sites.google.com/view/taskgrasp

AINov 3, 2020
Rearrangement: A Challenge for Embodied AI

Dhruv Batra, Angel X. Chang, Sonia Chernova et al.

We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specified by object poses, by images, by a description in language, or by letting the agent experience the environment in the goal state. We characterize rearrangement scenarios along different axes and describe metrics for benchmarking rearrangement performance. To facilitate research and exploration, we present experimental testbeds of rearrangement scenarios in four different simulation environments. We anticipate that other datasets will be released and new simulation platforms will be built to support training of rearrangement agents and their deployment on physical systems.

ROAug 24, 2020
Feature Guided Search for Creative Problem Solving Through Tool Construction

Lakshmi Nair, Sonia Chernova

Robots in the real world should be able to adapt to unforeseen circumstances. Particularly in the context of tool use, robots may not have access to the tools they need for completing a task. In this paper, we focus on the problem of tool construction in the context of task planning. We seek to enable robots to construct replacements for missing tools using available objects, in order to complete the given task. We introduce the Feature Guided Search (FGS) algorithm that enables the application of existing heuristic search approaches in the context of task planning, to perform tool construction efficiently. FGS accounts for physical attributes of objects (e.g., shape, material) during the search for a valid task plan. Our results demonstrate that FGS significantly reduces the search effort over standard heuristic search approaches by approximately 93% for tool construction.

ROAug 24, 2020
Tool Macgyvering: A Novel Framework for Combining Tool Substitution and Construction

Lakshmi Nair, Nithin Shrivatsav, Sonia Chernova

Macgyvering refers to solving problems inventively by using whatever objects are available at hand. Tool Macgyvering is a subset of macgyvering tasks involving a missing tool that is either substituted (tool substitution) or constructed (tool construction), from available objects. In this paper, we introduce a novel Tool Macgyvering framework that combines tool substitution and construction using arbitration that decides between the two options to output a final macgyvering solution. Our tool construction approach reasons about the shape, material, and different ways of attaching objects to construct a desired tool. We further develop value functions that enable the robot to effectively arbitrate between substitution and construction. Our results show that our tool construction approach is able to successfully construct working tools with an accuracy of 96.67%, and our arbitration strategy successfully chooses between substitution and construction with an accuracy of 83.33%.

ROJun 5, 2020
Anticipatory Human-Robot Collaboration via Multi-Objective Trajectory Optimization

Abhinav Jain, Daphne Chen, Dhruva Bansal et al.

We address the problem of adapting robot trajectories to improve safety, comfort, and efficiency in human-robot collaborative tasks. To this end, we propose CoMOTO, a trajectory optimization framework that utilizes stochastic motion prediction models to anticipate the human's motion and adapt the robot's joint trajectory accordingly. We design a multi-objective cost function that simultaneously optimizes for i) separation distance, ii) visibility of the end-effector, iii) legibility, iv) efficiency, and v) smoothness. We evaluate CoMOTO against three existing methods for robot trajectory generation when in close proximity to humans. Our experimental results indicate that our approach consistently outperforms existing methods over a combined set of safety, comfort, and efficiency metrics.

ROMay 29, 2020
Human-Centric Active Perception for Autonomous Observation

David Kent, Sonia Chernova

As robot autonomy improves, robots are increasingly being considered in the role of autonomous observation systems -- free-flying cameras capable of actively tracking human activity within some predefined area of interest. In this work, we formulate the autonomous observation problem through multi-objective optimization, presenting a novel Semi-MDP formulation of the autonomous human observation problem that maximizes observation rewards while accounting for both human- and robot-centric costs. We demonstrate that the problem can be solved with both scalarization-based Multi-Objective MDP methods and Constrained MDP methods, and discuss the relative benefits of each approach. We validate our work on activity tracking using a NASA Astrobee robot operating within a simulated International Space Station environment.

ROApr 2, 2020
Multimodal Material Classification for Robots using Spectroscopy and High Resolution Texture Imaging

Zackory Erickson, Eliot Xing, Bharat Srirangam et al.

Material recognition can help inform robots about how to properly interact with and manipulate real-world objects. In this paper, we present a multimodal sensing technique, leveraging near-infrared spectroscopy and close-range high resolution texture imaging, that enables robots to estimate the materials of household objects. We release a dataset of high resolution texture images and spectral measurements collected from a mobile manipulator that interacted with 144 household objects. We then present a neural network architecture that learns a compact multimodal representation of spectral measurements and texture images. When generalizing material classification to new objects, we show that this multimodal representation enables a robot to recognize materials with greater performance as compared to prior state-of-the-art approaches. Finally, we present how a robot can combine this high resolution local sensing with images from the robot's head-mounted camera to achieve accurate material classification over a scene of objects on a table.

HCFeb 11, 2020
Leveraging Rationales to Improve Human Task Performance

Devleena Das, Sonia Chernova

Machine learning (ML) systems across many application areas are increasingly demonstrating performance that is beyond that of humans. In response to the proliferation of such models, the field of Explainable AI (XAI) has sought to develop techniques that enhance the transparency and interpretability of machine learning methods. In this work, we consider a question not previously explored within the XAI and ML communities: Given a computational system whose performance exceeds that of its human user, can explainable AI capabilities be leveraged to improve the performance of the human? We study this question in the context of the game of Chess, for which computational game engines that surpass the performance of the average player are widely available. We introduce the Rationale-Generating Algorithm, an automated technique for generating rationales for utility-based computational methods, which we evaluate with a multi-day user study against two baselines. The results show that our approach produces rationales that lead to statistically significant improvement in human task performance, demonstrating that rationales automatically generated from an AI's internal task model can be used not only to explain what the system is doing, but also to instruct the user and ultimately improve their task performance.

ROJan 28, 2020
Taking Recoveries to Task: Recovery-Driven Development for Recipe-based Robot Tasks

Siddhartha Banerjee, Angel Daruna, David Kent et al.

Robot task execution when situated in real-world environments is fragile. As such, robot architectures must rely on robust error recovery, adding non-trivial complexity to highly-complex robot systems. To handle this complexity in development, we introduce Recovery-Driven Development (RDD), an iterative task scripting process that facilitates rapid task and recovery development by leveraging hierarchical specification, separation of nominal task and recovery development, and situated testing. We validate our approach with our challenge-winning mobile manipulator software architecture developed using RDD for the FetchIt! Challenge at the IEEE 2019 International Conference on Robotics and Automation. We attribute the success of our system to the level of robustness achieved using RDD, and conclude with lessons learned for developing such systems.

CVDec 13, 2019
Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?

Abhishek Kadian, Joanne Truong, Aaron Gokaslan et al.

Does progress in simulation translate to progress on robots? If one method outperforms another in simulation, how likely is that trend to hold in reality on a robot? We examine this question for embodied PointGoal navigation, developing engineering tools and a research paradigm for evaluating a simulator by its sim2real predictivity. First, we develop Habitat-PyRobot Bridge (HaPy), a library for seamless execution of identical code on simulated agents and robots, transferring simulation-trained agents to a LoCoBot platform with a one-line code change. Second, we investigate the sim2real predictivity of Habitat-Sim for PointGoal navigation. We 3D-scan a physical lab space to create a virtualized replica, and run parallel tests of 9 different models in reality and simulation. We present a new metric called Sim-vs-Real Correlation Coefficient (SRCC) to quantify predictivity. We find that SRCC for Habitat as used for the CVPR19 challenge is low (0.18 for the success metric), suggesting that performance differences in this simulator-based challenge do not persist after physical deployment. This gap is largely due to AI agents learning to exploit simulator imperfections, abusing collision dynamics to 'slide' along walls, leading to shortcuts through otherwise non-navigable space. Naturally, such exploits do not work in the real world. Our experiments show that it is possible to tune simulation parameters to improve sim2real predictivity (e.g. improving $SRCC_{Succ}$ from 0.18 to 0.844), increasing confidence that in-simulation comparisons will translate to deployed systems in reality.

RONov 11, 2019
Tool Substitution with Shape and Material Reasoning Using Dual Neural Networks

Nithin Shrivatsav, Lakshmi Nair, Sonia Chernova

This paper explores the problem of tool substitution, namely, identifying substitute tools for performing a task from a given set of candidate tools. We introduce a novel approach to tool substitution, that unlike prior work in the area, combines both shape and material reasoning to effectively identify substitute tools. Our approach combines the use of visual and spectral reasoning using dual neural networks. It takes as input, the desired action to be performed, and outputs a ranking of the available candidate tools based on their suitability for performing the action. Our results on a test set of 30 real-world objects show that our approach is able to effectively match shape and material similarities, with improved tool substitution performance when combining both.

RONov 7, 2019
Benchmark for Skill Learning from Demonstration: Impact of User Experience, Task Complexity, and Start Configuration on Performance

M. Asif Rana, Daphne Chen, S. Reza Ahmadzadeh et al.

In this work, we contribute a large-scale study benchmarking the performance of multiple motion-based learning from demonstration approaches. Given the number and diversity of existing methods, it is critical that comprehensive empirical studies be performed comparing the relative strengths of these learning techniques. In particular, we evaluate four different approaches based on properties an end user may desire for real-world tasks. To perform this evaluation, we collected data from nine participants, across four different manipulation tasks with varying starting conditions. The resulting demonstrations were used to train 180 task models and evaluated on 720 task reproductions on a physical robot. Our results detail how i) complexity of the task, ii) the expertise of the human demonstrator, and iii) the starting configuration of the robot affect task performance. The collected dataset of demonstrations, robot executions, and evaluations are being made publicly available. Research insights and guidelines are also provided to guide future research and deployment choices about these approaches.

LGJul 1, 2019
Active Learning within Constrained Environments through Imitation of an Expert Questioner

Kalesha Bullard, Yannick Schroecker, Sonia Chernova

Active learning agents typically employ a query selection algorithm which solely considers the agent's learning objectives. However, this may be insufficient in more realistic human domains. This work uses imitation learning to enable an agent in a constrained environment to concurrently reason about both its internal learning goals and environmental constraints externally imposed, all within its objective function. Experiments are conducted on a concept learning task to test generalization of the proposed algorithm to different environmental conditions and analyze how time and resource constraints impact efficacy of solving the learning problem. Our findings show the environmentally-aware learning agent is able to statistically outperform all other active learners explored under most of the constrained conditions. A key implication is adaptation for active learning agents to more realistic human environments, where constraints are often externally imposed on the learner.

LGMay 29, 2019
Leveraging Semantics for Incremental Learning in Multi-Relational Embeddings

Angel Daruna, Weiyu Liu, Zsolt Kira et al.

Service robots benefit from encoding information in semantically meaningful ways to enable more robust task execution. Prior work has shown multi-relational embeddings can encode semantic knowledge graphs to promote generalizability and scalability, but only within a batched learning paradigm. We present Incremental Semantic Initialization (ISI), an incremental learning approach that enables novel semantic concepts to be initialized in the embedding in relation to previously learned embeddings of semantically similar concepts. We evaluate ISI on mined AI2Thor and MatterPort3D datasets; our experiments show that on average ISI improves immediate query performance by 41.4%. Additionally, ISI methods on average reduced the number of epochs required to approach model convergence by 78.2%.

AIMay 26, 2019
Path Ranking with Attention to Type Hierarchies

Weiyu Liu, Angel Daruna, Zsolt Kira et al.

The objective of the knowledge base completion problem is to infer missing information from existing facts in a knowledge base. Prior work has demonstrated the effectiveness of path-ranking based methods, which solve the problem by discovering observable patterns in knowledge graphs, consisting of nodes representing entities and edges representing relations. However, these patterns either lack accuracy because they rely solely on relations or cannot easily generalize due to the direct use of specific entity information. We introduce Attentive Path Ranking, a novel path pattern representation that leverages type hierarchies of entities to both avoid ambiguity and maintain generalization. Then, we present an end-to-end trained attention-based RNN model to discover the new path patterns from data. Experiments conducted on benchmark knowledge base completion datasets WN18RR and FB15k-237 demonstrate that the proposed model outperforms existing methods on the fact prediction task by statistically significant margins of 26% and 10%, respectively. Furthermore, quantitative and qualitative analyses show that the path patterns balance between generalization and discrimination.

ROMar 27, 2019
Skill Acquisition via Automated Multi-Coordinate Cost Balancing

Harish Ravichandar, S. Reza Ahmadzadeh, M. Asif Rana et al.

We propose a learning framework, named Multi-Coordinate Cost Balancing (MCCB), to address the problem of acquiring point-to-point movement skills from demonstrations. MCCB encodes demonstrations simultaneously in multiple differential coordinates that specify local geometric properties. MCCB generates reproductions by solving a convex optimization problem with a multi-coordinate cost function and linear constraints on the reproductions, such as initial, target, and via points. Further, since the relative importance of each coordinate system in the cost function might be unknown for a given skill, MCCB learns optimal weighting factors that balance the cost function. We demonstrate the effectiveness of MCCB via detailed experiments conducted on one handwriting dataset and three complex skill datasets.

ROMar 12, 2019
STRATA: A Unified Framework for Task Assignments in Large Teams of Heterogeneous Agents

Harish Ravichandar, Kenneth Shaw, Sonia Chernova

Large teams of heterogeneous agents have the potential to solve complex multi-task problems that are intractable for a single agent working independently. However, solving complex multi-task problems requires leveraging the relative strengths of the different kinds of agents in the team. We present Stochastic TRAit-based Task Assignment (STRATA), a unified framework that models large teams of heterogeneous agents and performs effective task assignments. Specifically, given information on which traits (capabilities) are required for various tasks, STRATA computes the assignments of agents to tasks such that the trait requirements are achieved. Inspired by prior work in robot swarms and biodiversity, we categorize agents into different species (groups) based on their traits. We model each trait as a continuous variable and differentiate between traits that can and cannot be aggregated from different agents. STRATA is capable of reasoning about both species-level and agent-level variability in traits. Further, we define measures of diversity for any given team based on the team's continuous-space trait model. We illustrate the necessity and effectiveness of STRATA using detailed experiments based in simulation and in a capture-the-flag game environment.

ROMar 1, 2019
RoboCSE: Robot Common Sense Embedding

Angel Daruna, Weiyu Liu, Zsolt Kira et al.

Autonomous service robots require computational frameworks that allow them to generalize knowledge to new situations in a manner that models uncertainty while scaling to real-world problem sizes. The Robot Common Sense Embedding (RoboCSE) showcases a class of computational frameworks, multi-relational embeddings, that have not been leveraged in robotics to model semantic knowledge. We validate RoboCSE on a realistic home environment simulator (AI2Thor) to measure how well it generalizes learned knowledge about object affordances, locations, and materials. Our experiments show that RoboCSE can perform prediction better than a baseline that uses pre-trained embeddings, such as Word2Vec, achieving statistically significant improvements while using orders of magnitude less memory than our Bayesian Logic Network baseline. In addition, we show that predictions made by RoboCSE are robust to significant reductions in data available for training as well as domain transfer to MatterPort3D, achieving statistically significant improvements over a baseline that memorizes training data.

ROFeb 10, 2019
Tool Macgyvering: Tool Construction Using Geometric Reasoning

Lakshmi Nair, Jonathan Balloch, Sonia Chernova

MacGyvering is defined as creating or repairing something in an inventive or improvised way by utilizing objects that are available at hand. In this paper, we explore a subset of Macgyvering problems involving tool construction, i.e., creating tools from parts available in the environment. We formalize the overall problem domain of tool Macgyvering, introducing three levels of complexity for tool construction and substitution problems, and presenting a novel computational framework aimed at solving one level of the tool Macgyvering problem, specifically contributing a novel algorithm for tool construction based on geometric reasoning. We validate our approach by constructing three tools using a 7-DOF robot arm.

CVSep 11, 2018
Unbiasing Semantic Segmentation For Robot Perception using Synthetic Data Feature Transfer

Jonathan C Balloch, Varun Agrawal, Irfan Essa et al.

Robot perception systems need to perform reliable image segmentation in real-time on noisy, raw perception data. State-of-the-art segmentation approaches use large CNN models and carefully constructed datasets; however, these models focus on accuracy at the cost of real-time inference. Furthermore, the standard semantic segmentation datasets are not large enough for training CNNs without augmentation and are not representative of noisy, uncurated robot perception data. We propose improving the performance of real-time segmentation frameworks on robot perception data by transferring features learned from synthetic segmentation data. We show that pretraining real-time segmentation architectures with synthetic segmentation data instead of ImageNet improves fine-tuning performance by reducing the bias learned in pretraining and closing the \textit{transfer gap} as a result. Our experiments show that our real-time robot perception models pretrained on synthetic data outperform those pretrained on ImageNet for every scale of fine-tuning data examined. Moreover, the degree to which synthetic pretraining outperforms ImageNet pretraining increases as the availability of robot data decreases, making our approach attractive for robotics domains where dataset collection is hard and/or expensive.

ROAug 1, 2018
Learning Generalizable Robot Skills from Demonstrations in Cluttered Environments

Muhammad Asif Rana, Mustafa Mukadam, Seyed Reza Ahmadzadeh et al.

Learning from Demonstration (LfD) is a popular approach to endowing robots with skills without having to program them by hand. Typically, LfD relies on human demonstrations in clutter-free environments. This prevents the demonstrations from being affected by irrelevant objects, whose influence can obfuscate the true intention of the human or the constraints of the desired skill. However, it is unrealistic to assume that the robot's environment can always be restructured to remove clutter when capturing human demonstrations. To contend with this problem, we develop an importance weighted batch and incremental skill learning approach, building on a recent inference-based technique for skill representation and reproduction. Our approach reduces unwanted environmental influences on the learned skill, while still capturing the salient human behavior. We provide both batch and incremental versions of our approach and validate our algorithms on a 7-DOF JACO2 manipulator with reaching and placing skills.

ROMay 10, 2018
Classification of Household Materials via Spectroscopy

Zackory Erickson, Nathan Luskey, Sonia Chernova et al.

Recognizing an object's material can inform a robot on the object's fragility or appropriate use. To estimate an object's material during manipulation, many prior works have explored the use of haptic sensing. In this paper, we explore a technique for robots to estimate the materials of objects using spectroscopy. We demonstrate that spectrometers provide several benefits for material recognition, including fast response times and accurate measurements with low noise. Furthermore, spectrometers do not require direct contact with an object. To explore this, we collected a dataset of spectral measurements from two commercially available spectrometers during which a robotic platform interacted with 50 flat material objects, and we show that a neural network model can accurately analyze these measurements. Due to the similarity between consecutive spectral measurements, our model achieved a material classification accuracy of 94.6% when given only one spectral sample per object. Similar to prior works with haptic sensors, we found that generalizing material recognition to new objects posed a greater challenge, for which we achieved an accuracy of 79.1% via leave-one-object-out cross-validation. Finally, we demonstrate how a PR2 robot can leverage spectrometers to estimate the materials of everyday objects found in the home. From this work, we find that spectroscopy poses a promising approach for material classification during robotic manipulation.

AIApr 26, 2018
Action Categorization for Computationally Improved Task Learning and Planning

Lakshmi Nair, Sonia Chernova

This paper explores the problem of task learning and planning, contributing the Action-Category Representation (ACR) to improve computational performance of both Planning and Reinforcement Learning (RL). ACR is an algorithm-agnostic, abstract data representation that maps objects to action categories (groups of actions), inspired by the psychological concept of action codes. We validate our approach in StarCraft and Lightworld domains; our results demonstrate several benefits of ACR relating to improved computational performance of planning and RL, by reducing the action space for the agent.

ROApr 17, 2018
Effects of Interruptibility-Aware Robot Behavior

Siddhartha Banerjee, Andrew Silva, Karen Feigh et al.

As robots become increasingly prevalent in human environments, there will inevitably be times when a robot needs to interrupt a human to initiate an interaction. Our work introduces the first interruptibility-aware mobile robot system, and evaluates the effects of interruptibility-awareness on human task performance, robot task performance, and on human interpretation of the robot's social aptitude. Our results show that our robot is effective at predicting interruptibility at high accuracy, allowing it to interrupt at more appropriate times. Results of a large-scale user study show that while participants are able to maintain task performance even in the presence of interruptions, interruptibility-awareness improves the robot's task performance and improves participant social perception of the robot.

ROJul 10, 2017
Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks

Zackory Erickson, Sonia Chernova, Charles C. Kemp

Material recognition enables robots to incorporate knowledge of material properties into their interactions with everyday objects. For example, material recognition opens up opportunities for clearer communication with a robot, such as "bring me the metal coffee mug", and recognizing plastic versus metal is crucial when using a microwave or oven. However, collecting labeled training data with a robot is often more difficult than unlabeled data. We present a semi-supervised learning approach for material recognition that uses generative adversarial networks (GANs) with haptic features such as force, temperature, and vibration. Our approach achieves state-of-the-art results and enables a robot to estimate the material class of household objects with ~90% accuracy when 92% of the training data are unlabeled. We explore how well this approach can recognize the material of new objects and we discuss challenges facing generalization. To motivate learning from unlabeled training data, we also compare results against several common supervised learning classifiers. In addition, we have released the dataset used for this work which consists of time-series haptic measurements from a robot that conducted thousands of interactions with 72 household objects.

AIJul 1, 2016
Situated Structure Learning of a Bayesian Logic Network for Commonsense Reasoning

Haley Garrison, Sonia Chernova

This paper details the implementation of an algorithm for automatically generating a high-level knowledge network to perform commonsense reasoning, specifically with the application of robotic task repair. The network is represented using a Bayesian Logic Network (BLN) (Jain, Waldherr, and Beetz 2009), which combines a set of directed relations between abstract concepts, including IsA, AtLocation, HasProperty, and UsedFor, with a corresponding probability distribution that models the uncertainty inherent in these relations. Inference over this network enables reasoning over the abstract concepts in order to perform appropriate object substitution or to locate missing objects in the robot's environment. The structure of the network is generated by combining information from two existing knowledge sources: ConceptNet (Speer and Havasi 2012), and WordNet (Miller 1995). This is done in a "situated" manner by only including information relevant a given context. Results show that the generated network is able to accurately predict object categories, locations, properties, and affordances in three different household scenarios.