Julie A. Shah

RO
h-index16
18papers
1,413citations
Novelty44%
AI Score38

18 Papers

ROSep 12, 2024
Adaptive Language-Guided Abstraction from Contrastive Explanations

Andi Peng, Belinda Z. Li, Ilia Sucholutsky et al.

Many approaches to robot learning begin by inferring a reward function from a set of human demonstrations. To learn a good reward, it is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. End-to-end methods for joint feature and reward learning (e.g., using deep networks or program synthesis techniques) often yield brittle reward functions that are sensitive to spurious state features. By contrast, humans can often generalizably learn from a small number of demonstrations by incorporating strong priors about what features of a demonstration are likely meaningful for a task of interest. How do we build robots that leverage this kind of background knowledge when learning from new demonstrations? This paper describes a method named ALGAE (Adaptive Language-Guided Abstraction from [Contrastive] Explanations) which alternates between using language models to iteratively identify human-meaningful features needed to explain demonstrated behavior, then standard inverse reinforcement learning techniques to assign weights to these features. Experiments across a variety of both simulated and real-world robot environments show that ALGAE learns generalizable reward functions defined on interpretable features using only small numbers of demonstrations. Importantly, ALGAE can recognize when features are missing, then extract and define those features without any human input -- making it possible to quickly and efficiently acquire rich representations of user behavior.

LGSep 9, 2024
Enhancing Preference-based Linear Bandits via Human Response Time

Shen Li, Yuyang Zhang, Zhaolin Ren et al.

Interactive preference learning systems infer human preferences by presenting queries as pairs of options and collecting binary choices. Although binary choices are simple and widely used, they provide limited information about preference strength. To address this, we leverage human response times, which are inversely related to preference strength, as an additional signal. We propose a computationally efficient method that combines choices and response times to estimate human utility functions, grounded in the EZ diffusion model from psychology. Theoretical and empirical analyses show that for queries with strong preferences, response times complement choices by providing extra information about preference strength, leading to significantly improved utility estimation. We incorporate this estimator into preference-based linear bandits for fixed-budget best-arm identification. Simulations on three real-world datasets demonstrate that using response times significantly accelerates preference learning compared to choice-only approaches. Additional materials, such as code, slides, and talk video, are available at https://shenlirobot.github.io/pages/NeurIPS24.html

ROFeb 28, 2024
Learning with Language-Guided State Abstractions

Andi Peng, Ilia Sucholutsky, Belinda Z. Li et al.

We describe a framework for using natural language to design state abstractions for imitation learning. Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations, which can surface important features of an environment and hide irrelevant ones. These state representations are typically manually specified, or derived from other labor-intensive labeling procedures. Our method, LGA (language-guided abstraction), uses a combination of natural language supervision and background knowledge from language models (LMs) to automatically build state representations tailored to unseen tasks. In LGA, a user first provides a (possibly incomplete) description of a target task in natural language; next, a pre-trained LM translates this task description into a state abstraction function that masks out irrelevant features; finally, an imitation policy is trained using a small number of demonstrations and LGA-generated abstract states. Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time, and that these abstractions improve generalization and robustness in the presence of spurious correlations and ambiguous specifications. We illustrate the utility of the learned abstractions on mobile manipulation tasks with a Spot robot.

ROFeb 5, 2024
Preference-Conditioned Language-Guided Abstraction

Andi Peng, Andreea Bobu, Belinda Z. Li et al.

Learning from demonstrations is a common way for users to teach robots, but it is prone to spurious feature correlations. Recent work constructs state abstractions, i.e. visual representations containing task-relevant features, from language as a way to perform more generalizable learning. However, these abstractions also depend on a user's preference for what matters in a task, which may be hard to describe or infeasible to exhaustively specify using language alone. How do we construct abstractions to capture these latent preferences? We observe that how humans behave reveals how they see the world. Our key insight is that changes in human behavior inform us that there are differences in preferences for how humans see the world, i.e. their state abstractions. In this work, we propose using language models (LMs) to query for those preferences directly given knowledge that a change in behavior has occurred. In our framework, we use the LM in two ways: first, given a text description of the task and knowledge of behavioral change between states, we query the LM for possible hidden preferences; second, given the most likely preference, we query the LM to construct the state abstraction. In this framework, the LM is also able to ask the human directly when uncertain about its own estimate. We demonstrate our framework's ability to construct effective preference-conditioned abstractions in simulated experiments, a user study, as well as on a real Spot robot performing mobile manipulation tasks.

AISep 14, 2025
Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning

Pulkit Verma, Ngoc La, Anthony Favier et al.

Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, yet their ability to perform structured symbolic planning remains limited, particularly in domains requiring formal representations like the Planning Domain Definition Language (PDDL). In this paper, we present a novel instruction tuning framework, PDDL-Instruct, designed to enhance LLMs' symbolic planning capabilities through logical chain-of-thought reasoning. Our approach focuses on teaching models to rigorously reason about action applicability, state transitions, and plan validity using explicit logical inference steps. By developing instruction prompts that guide models through the precise logical reasoning required to determine when actions can be applied in a given state, we enable LLMs to self-correct their planning processes through structured reflection. The framework systematically builds verification skills by decomposing the planning process into explicit reasoning chains about precondition satisfaction, effect application, and invariant preservation. Experimental results on multiple planning domains show that our chain-of-thought reasoning based instruction-tuned models are significantly better at planning, achieving planning accuracy of up to 94% on standard benchmarks, representing a 66% absolute improvement over baseline models. This work bridges the gap between the general reasoning capabilities of LLMs and the logical precision required for automated planning, offering a promising direction for developing better AI planning systems.

AIMay 28, 2025
HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym

Ngoc La, Ruaridh Mon-Williams, Julie A. Shah

In recent years, reinforcement learning (RL) methods have been widely tested using tools like OpenAI Gym, though many tasks in these environments could also benefit from hierarchical planning. However, there is a lack of a tool that enables seamless integration of hierarchical planning with RL. Hierarchical Domain Definition Language (HDDL), used in classical planning, introduces a structured approach well-suited for model-based RL to address this gap. To bridge this integration, we introduce HDDLGym, a Python-based tool that automatically generates OpenAI Gym environments from HDDL domains and problems. HDDLGym serves as a link between RL and hierarchical planning, supporting multi-agent scenarios and enabling collaborative planning among agents. This paper provides an overview of HDDLGym's design and implementation, highlighting the challenges and design choices involved in integrating HDDL with the Gym interface, and applying RL policies to support hierarchical planning. We also provide detailed instructions and demonstrations for using the HDDLGym framework, including how to work with existing HDDL domains and problems from International Planning Competitions, exemplified by the Transport domain. Additionally, we offer guidance on creating new HDDL domains for multi-agent scenarios and demonstrate the practical use of HDDLGym in the Overcooked domain. By leveraging the advantages of HDDL and Gym, HDDLGym aims to be a valuable tool for studying RL in hierarchical planning, particularly in multi-agent contexts.

RONov 27, 2024
Explainable deep learning improves human mental models of self-driving cars

Eoin M. Kenny, Akshay Dharmavaram, Sang Uk Lee et al.

Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. However, the opacity of such black-box motion planners makes it challenging for the human behind the wheel to accurately anticipate when they will fail, with potentially catastrophic consequences. Here, we introduce concept-wrapper network (i.e., CW-Net), a method for explaining the behavior of black-box motion planners by grounding their reasoning in human-interpretable concepts. We deploy CW-Net on a real self-driving car and show that the resulting explanations refine the human driver's mental model of the car, allowing them to better predict its behavior and adjust their own behavior accordingly. Unlike previous work using toy domains or simulations, our study presents the first real-world demonstration of how to build authentic autonomous vehicles (AVs) that give interpretable, causally faithful explanations for their decisions, without sacrificing performance. We anticipate our method could be applied to other safety-critical systems with a human in the loop, such as autonomous drones and robotic surgeons. Overall, our study suggests a pathway to explainability for autonomous agents as a whole, which can help make them more transparent, their deployment safer, and their usage more ethical.

LGDec 12, 2024
Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off

Eoin M. Kenny, Julie A. Shah

Regulation is increasingly cited as the most important and pressing concern in machine learning. However, it is currently unknown how to implement this, and perhaps more importantly, how it would effect model performance alongside human collaboration if actually realized. In this paper, we attempt to answer these questions by building a regulatable large-language model (LLM), and then quantifying how the additional constraints involved affect (1) model performance, alongside (2) human collaboration. Our empirical results reveal that it is possible to force an LLM to use human-defined features in a transparent way, but a "regulation performance trade-off" previously not considered reveals itself in the form of a 7.34% classification performance drop. Surprisingly however, we show that despite this, such systems actually improve human task performance speed and appropriate confidence in a realistic deployment setting compared to no AI assistance, thus paving a way for fair, regulatable AI, which benefits users.

SYOct 18, 2021
Set-based State Estimation with Probabilistic Consistency Guarantee under Epistemic Uncertainty

Shen Li, Theodoros Stouraitis, Michael Gienger et al.

Consistent state estimation is challenging, especially under the epistemic uncertainties arising from learned (nonlinear) dynamic and observation models. In this work, we propose a set-based estimation algorithm, named Gaussian Process-Zonotopic Kalman Filter (GP-ZKF), that produces zonotopic state estimates while respecting both the epistemic uncertainties in the learned models and aleatoric uncertainties. Our method guarantees probabilistic consistency, in the sense that the true states are bounded by sets (zonotopes) across all time steps, with high probability. We formally relate GP-ZKF with the corresponding stochastic approach, GP-EKF, in the case of learned (nonlinear) models. In particular, when linearization errors and aleatoric uncertainties are omitted and epistemic uncertainties are simplified, GP-ZKF reduces to GP-EKF. We empirically demonstrate our method's efficacy in both a simulated pendulum domain and a real-world robot-assisted dressing domain, where GP-ZKF produced more consistent and less conservative set-based estimates than all baseline stochastic methods.

ROMar 26, 2021
Reactive Task and Motion Planning under Temporal Logic Specifications

Shen Li, Daehyung Park, Yoonchang Sung et al.

We present a task-and-motion planning (TAMP) algorithm robust against a human operator's cooperative or adversarial interventions. Interventions often invalidate the current plan and require replanning on the fly. Replanning can be computationally expensive and often interrupts seamless task execution. We introduce a dynamically reconfigurable planning methodology with behavior tree-based control strategies toward reactive TAMP, which takes the advantage of previous plans and incremental graph search during temporal logic-based reactive synthesis. Our algorithm also shows efficient recovery functionalities that minimize the number of replanning steps. Finally, our algorithm produces a robust, efficient, and complete TAMP solution. Our experimental results show the algorithm results in superior manipulation performance in both simulated and real-world tasks.

AIFeb 17, 2021
Towards an AI Coach to Infer Team Mental Model Alignment in Healthcare

Sangwon Seo, Lauren R. Kennedy-Metz, Marco A. Zenati et al.

Shared mental models are critical to team success; however, in practice, team members may have misaligned models due to a variety of factors. In safety-critical domains (e.g., aviation, healthcare), lack of shared mental models can lead to preventable errors and harm. Towards the goal of mitigating such preventable errors, here, we present a Bayesian approach to infer misalignment in team members' mental models during complex healthcare task execution. As an exemplary application, we demonstrate our approach using two simulated team-based scenarios, derived from actual teamwork in cardiac surgery. In these simulated experiments, our approach inferred model misalignment with over 75% recall, thereby providing a building block for enabling computer-assisted interventions to augment human cognition in the operating room and improve teamwork.

RONov 22, 2020
Experimental Assessment of Human-Robot Teaming for Multi-Step Remote Manipulation with Expert Operators

Claudia Pérez-D'Arpino, Rebecca P. Khurshid, Julie A. Shah

Remote robot manipulation with human control enables applications where safety and environmental constraints are adverse to humans (e.g. underwater, space robotics and disaster response) or the complexity of the task demands human-level cognition and dexterity (e.g. robotic surgery and manufacturing). These systems typically use direct teleoperation at the motion level, and are usually limited to low-DOF arms and 2D perception. Improving dexterity and situational awareness demands new interaction and planning workflows. We explore the use of human-robot teaming through teleautonomy with assisted planning for remote control of a dual-arm dexterous robot for multi-step manipulation tasks, and conduct a within-subjects experimental assessment (n=12 expert users) to compare it with other methods, resulting in the following four conditions: (A) Direct teleoperation with imitation controller + 2D perception, (B) Condition A + 3D perception, (C) Teleautonomy interface teleoperation + 2D & 3D perception, (D) Condition C + assisted planning. The results indicate that this approach (D) achieves task times comparable with direct teleoperation (A,B) while improving a number of other objective and subjective metrics, including re-grasps, collisions, and TLX workload metrics. When compared to a similar interface but removing the assisted planning (C), D reduces the task time and removes a significant interaction with the level of expertise of the operator, resulting in a performance equalizer across users.

ROOct 27, 2020
The State of Industrial Robotics: Emerging Technologies, Challenges, and Key Research Directions

Lindsay Sanneman, Christopher Fourie, Julie A. Shah

Robotics and related technologies are central to the ongoing digitization and advancement of manufacturing. In recent years, a variety of strategic initiatives around the world including "Industry 4.0", introduced in Germany in 2011 have aimed to improve and connect manufacturing technologies in order to optimize production processes. In this work, we study the changing technological landscape of robotics and "internet-of-things" (IoT)-based connective technologies over the last 7-10 years in the wake of Industry 4.0. We interviewed key players within the European robotics ecosystem, including robotics manufacturers and integrators, original equipment manufacturers (OEMs), and applied industrial research institutions and synthesize our findings in this paper. We first detail the state-of-the-art robotics and IoT technologies we observed and that the companies discussed during our interviews. We then describe the processes the companies follow when deciding whether and how to integrate new technologies, the challenges they face when integrating these technologies, and some immediate future technological avenues they are exploring in robotics and IoT. Finally, based on our findings, we highlight key research directions for the robotics community that can enable improved capabilities in the context of manufacturing.

ROMay 12, 2020
Trust Considerations for Explainable Robots: A Human Factors Perspective

Lindsay Sanneman, Julie A. Shah

Recent advances in artificial intelligence (AI) and robotics have drawn attention to the need for AI systems and robots to be understandable to human users. The explainable AI (XAI) and explainable robots literature aims to enhance human understanding and human-robot team performance by providing users with necessary information about AI and robot behavior. Simultaneously, the human factors literature has long addressed important considerations that contribute to human performance, including human trust in autonomous systems. In this paper, drawing from the human factors literature, we discuss three important trust-related considerations for the design of explainable robot systems: the bases of trust, trust calibration, and trust specificity. We further detail existing and potential metrics for assessing trust in robotic systems based on explanations provided by explainable robots.

CLSep 13, 2019
Learning Household Task Knowledge from WikiHow Descriptions

Yilun Zhou, Julie A. Shah, Steven Schockaert

Commonsense procedural knowledge is important for AI agents and robots that operate in a human environment. While previous attempts at constructing procedural knowledge are mostly rule- and template-based, recent advances in deep learning provide the possibility of acquiring such knowledge directly from natural language sources. As a first step in this direction, we propose a model to learn embeddings for tasks, as well as the individual steps that need to be taken to solve them, based on WikiHow articles. We learn these embeddings such that they are predictive of both step relevance and step ordering. We also experiment with the use of integer programming for inferring consistent global step orderings from noisy pairwise predictions.

MASep 11, 2019
On Memory Mechanism in Multi-Agent Reinforcement Learning

Yilun Zhou, Derrik E. Asher, Nicholas R. Waytowich et al.

Multi-agent reinforcement learning (MARL) extends (single-agent) reinforcement learning (RL) by introducing additional agents and (potentially) partial observability of the environment. Consequently, algorithms for solving MARL problems incorporate various extensions beyond traditional RL methods, such as a learned communication protocol between cooperative agents that enables exchange of private information or adaptive modeling of opponents in competitive settings. One popular algorithmic construct is a memory mechanism such that an agent's decisions can depend not only upon the current state but also upon the history of observed states and actions. In this paper, we study how a memory mechanism can be useful in environments with different properties, such as observability, internality and presence of a communication channel. Using both prior work and new experiments, we show that a memory mechanism is helpful when learning agents need to model other agents and/or when communication is constrained in some way; however we must to be cautious of agents achieving effective memoryfulness through other means.

CLFeb 21, 2019
Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness

Yilun Zhou, Steven Schockaert, Julie A. Shah

In many applications, it is important to characterize the way in which two concepts are semantically related. Knowledge graphs such as ConceptNet provide a rich source of information for such characterizations by encoding relations between concepts as edges in a graph. When two concepts are not directly connected by an edge, their relationship can still be described in terms of the paths that connect them. Unfortunately, many of these paths are uninformative and noisy, which means that the success of applications that use such path features crucially relies on their ability to select high-quality paths. In existing applications, this path selection process is based on relatively simple heuristics. In this paper we instead propose to learn to predict path quality from crowdsourced human assessments. Since we are interested in a generic task-independent notion of quality, we simply ask human participants to rank paths according to their subjective assessment of the paths' naturalness, without attempting to define naturalness or steering the participants towards particular indicators of quality. We show that a neural network model trained on these assessments is able to predict human judgments on unseen paths with near optimal performance. Most notably, we find that the resulting path selection method is substantially better than the current heuristic approaches at identifying meaningful paths.

ROOct 21, 2018
Pose consensus based on dual quaternion algebra with application to decentralized formation control of mobile manipulators

Heitor J. Savino, Luciano C. A. Pimenta, Julie A. Shah et al.

This paper presents a solution based on dual quaternion algebra to the general problem of pose (i.e., position and orientation) consensus for systems composed of multiple rigid-bodies. The dual quaternion algebra is used to model the agents' poses and also in the distributed control laws, making the proposed technique easily applicable to time-varying formation control of general robotic systems. The proposed pose consensus protocol has guaranteed convergence when the interaction among the agents is represented by directed graphs with directed spanning trees, which is a more general result when compared to the literature on formation control. In order to illustrate the proposed pose consensus protocol and its extension to the problem of formation control, we present a numerical simulation with a large number of free-flying agents and also an application of cooperative manipulation by using real mobile manipulators.