Matthias Scheutz

h-index28

36papers

3,192citations

Novelty40%

AI Score54

Ranked #27,851 of 201,018 authors (top 14%)#652 in RO (top 9%)

36 Papers

AIJun 24, 2022

RAPid-Learn: A Framework for Learning to Recover for Handling Novelties in Open-World Environments

Shivam Goel, Yash Shukla, Vasanth Sarathy et al.

We propose RAPid-Learn: Learning to Recover and Plan Again, a hybrid planning and learning method, to tackle the problem of adapting to sudden and unexpected changes in an agent's environment (i.e., novelties). RAPid-Learn is designed to formulate and solve modifications to a task's Markov Decision Process (MDPs) on-the-fly and is capable of exploiting domain knowledge to learn any new dynamics caused by the environmental changes. It is capable of exploiting the domain knowledge to learn action executors which can be further used to resolve execution impasses, leading to a successful plan execution. This novelty information is reflected in its updated domain model. We demonstrate its efficacy by introducing a wide variety of novelties in a gridworld environment inspired by Minecraft, and compare our algorithm with transfer learning baselines from the literature. Our method is (1) effective even in the presence of multiple novelties, (2) more sample efficient than transfer learning RL baselines, and (3) robust to incomplete model information, as opposed to pure symbolic planning approaches.

AIFeb 28, 2023

Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments

Tung Thai, Ming Shen, Mayank Garg et al. · amazon-science

Learning to detect, characterize and accommodate novelties is a challenge that agents operating in open-world domains need to address to be able to guarantee satisfactory task performance. Certain novelties (e.g., changes in environment dynamics) can interfere with the performance or prevent agents from accomplishing task goals altogether. In this paper, we introduce general methods and architectural mechanisms for detecting and characterizing different types of novelties, and for building an appropriate adaptive model to accommodate them utilizing logical representations and reasoning methods. We demonstrate the effectiveness of the proposed methods in evaluations performed by a third party in the adversarial multi-agent board game Monopoly. The results show high novelty detection and accommodation rates across a variety of novelty types, including changes to the rules of the game, as well as changes to the agent's action capabilities.

LGAug 1, 2022

Joint covariate-alignment and concept-alignment: a framework for domain generalization

Thuan Nguyen, Boyang Lyu, Prakash Ishwar et al.

In this paper, we propose a novel domain generalization (DG) framework based on a new upper bound to the risk on the unseen domain. Particularly, our framework proposes to jointly minimize both the covariate-shift as well as the concept-shift between the seen domains for a better performance on the unseen domain. While the proposed approach can be implemented via an arbitrary combination of covariate-alignment and concept-alignment modules, in this work we use well-established approaches for distributional alignment namely, Maximum Mean Discrepancy (MMD) and covariance Alignment (CORAL), and use an Invariant Risk Minimization (IRM)-based approach for concept alignment. Our numerical results show that the proposed methods perform as well as or better than the state-of-the-art for domain generalization on several data sets.

LGApr 2, 2023

A principled approach to model validation in domain generalization

Boyang Lyu, Thuan Nguyen, Matthias Scheutz et al.

Domain generalization aims to learn a model with good generalization ability, that is, the learned model should not only perform well on several seen domains but also on unseen domains with different data distributions. State-of-the-art domain generalization methods typically train a representation function followed by a classifier jointly to minimize both the classification risk and the domain discrepancy. However, when it comes to model selection, most of these methods rely on traditional validation routines that select models solely based on the lowest classification risk on the validation set. In this paper, we theoretically demonstrate a trade-off between minimizing classification risk and mitigating domain discrepancy, i.e., it is impossible to achieve the minimum of these two objectives simultaneously. Motivated by this theoretical result, we propose a novel model selection method suggesting that the validation process should account for both the classification risk and the domain discrepancy. We validate the effectiveness of the proposed method by numerical results on several domain generalization datasets.

LGOct 26, 2022

Trade-off between reconstruction loss and feature alignment for domain generalization

Thuan Nguyen, Boyang Lyu, Prakash Ishwar et al.

Domain generalization (DG) is a branch of transfer learning that aims to train the learning models on several seen domains and subsequently apply these pre-trained models to other unseen (unknown but related) domains. To deal with challenging settings in DG where both data and label of the unseen domain are not available at training time, the most common approach is to design the classifiers based on the domain-invariant representation features, i.e., the latent representations that are unchanged and transferable between domains. Contrary to popular belief, we show that designing classifiers based on invariant representation features alone is necessary but insufficient in DG. Our analysis indicates the necessity of imposing a constraint on the reconstruction loss induced by representation functions to preserve most of the relevant information about the label in the latent space. More importantly, we point out the trade-off between minimizing the reconstruction loss and achieving domain alignment in DG. Our theoretical results motivate a new DG framework that jointly optimizes the reconstruction loss and the domain discrepancy. Both theoretical and numerical results are provided to justify our approach.

ROMar 11

Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning

Hong Lu, Pierrick Lorang, Timothy R. Duggan et al.

In dynamic open-world environments, autonomous agents often encounter novelties that hinder their ability to find plans to achieve their goals. Specifically, traditional symbolic planners fail to generate plans when the robot's planning domain lacks the operators that enable it to interact appropriately with novel objects in the environment. We propose a neuro-symbolic architecture that integrates symbolic planning, reinforcement learning, and a large language model (LLM) to learn how to handle novel objects. In particular, we leverage the common sense reasoning capability of the LLM to identify missing operators, generate plans with the symbolic AI planner, and write reward functions to guide the reinforcement learning agent in learning control policies for newly identified operators. Our method outperforms the state-of-the-art methods in operator discovery as well as operator learning in continuous robotic domains.

CVJun 23, 2022

NovelCraft: A Dataset for Novelty Detection and Discovery in Open Worlds

Patrick Feeney, Sarah Schneider, Panagiotis Lymperopoulos et al.

In order for artificial agents to successfully perform tasks in changing environments, they must be able to both detect and adapt to novelty. However, visual novelty detection research often only evaluates on repurposed datasets such as CIFAR-10 originally intended for object classification, where images focus on one distinct, well-centered object. New benchmarks are needed to represent the challenges of navigating the complex scenes of an open world. Our new NovelCraft dataset contains multimodal episodic data of the images and symbolic world-states seen by an agent completing a pogo stick assembly task within a modified Minecraft environment. In some episodes, we insert novel objects of varying size within the complex 3D scene that may impact gameplay. Our visual novelty detection benchmark finds that methods that rank best on popular area-under-the-curve metrics may be outperformed by simpler alternatives when controlling false positives matters most. Further multimodal novelty detection experiments suggest that methods that fuse both visual and symbolic information can improve time until detection as well as overall discrimination. Finally, our evaluation of recent generalized category discovery methods suggests that adapting to new imbalanced categories in complex scenes remains an exciting open problem.

CLFeb 3

Where Norms and References Collide: Evaluating LLMs on Normative Reasoning

Mitchell Abrams, Kaveh Eskandari Miandoab, Felix Gervits et al.

Embodied agents, such as robots, will need to interact in situated environments where successful communication often depends on reasoning over social norms: shared expectations that constrain what actions are appropriate in context. A key capability in such settings is norm-based reference resolution (NBRR), where interpreting referential expressions requires inferring implicit normative expectations grounded in physical and social context. Yet it remains unclear whether Large Language Models (LLMs) can support this kind of reasoning. In this work, we introduce SNIC (Situated Norms in Context), a human-validated diagnostic testbed designed to probe how well state-of-the-art LLMs can extract and utilize normative principles relevant to NBRR. SNIC emphasizes physically grounded norms that arise in everyday tasks such as cleaning, tidying, and serving. Across a range of controlled evaluations, we find that even the strongest LLMs struggle to consistently identify and apply social norms, particularly when norms are implicit, underspecified, or in conflict. These findings reveal a blind spot in current LLMs and highlight a key challenge for deploying language-based systems in socially situated, embodied settings.

CLNov 6, 2025

IntelliProof: An Argumentation Network-based Conversational Helper for Organized Reflection

Kaveh Eskandari Miandoab, Katharine Kowalyshyn, Kabir Pamnani et al.

We present IntelliProof, an interactive system for analyzing argumentative essays through LLMs. IntelliProof structures an essay as an argumentation graph, where claims are represented as nodes, supporting evidence is attached as node properties, and edges encode supporting or attacking relations. Unlike existing automated essay scoring systems, IntelliProof emphasizes the user experience: each relation is initially classified and scored by an LLM, then visualized for enhanced understanding. The system provides justifications for classifications and produces quantitative measures for essay coherence. It enables rapid exploration of argumentative quality while retaining human oversight. In addition, IntelliProof provides a set of tools for a better understanding of an argumentative essay and its corresponding graph in natural language, bridging the gap between the structural semantics of argumentative essays and the user's understanding of a given text.

ROApr 4

Build on Priors: Vision--Language--Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation

Pierrick Lorang, Johannes Huemer, Timothy Duggan et al.

Enabling robots to learn long-horizon manipulation tasks from a handful of demonstrations remains a central challenge in robotics. Existing neuro-symbolic approaches often rely on hand-crafted symbolic abstractions, semantically labeled trajectories or large demonstration datasets, limiting their scalability and real-world applicability. We present a scalable neuro-symbolic framework that autonomously constructs symbolic planning domains and data-efficient control policies from as few as one to thirty unannotated skill demonstrations, without requiring manual domain engineering. Our method segments demonstrations into skills and employs a Vision-Language Model (VLM) to classify skills and identify equivalent high-level states, enabling automatic construction of a state-transition graph. This graph is processed by an Answer Set Programming solver to synthesize a PDDL planning domain, which an oracle function exploits to isolate the minimal, task-relevant and target relative observation and action spaces for each skill policy. Policies are learned at the control reference level rather than at the raw actuator signal level, yielding a smoother and less noisy learning target. Known controllers can be leveraged for real-world data augmentation by projecting a single demonstration onto other objects in the scene, simultaneously enriching the graph construction process and the dataset for imitation learning. We validate our framework primarily on a real industrial forklift across statistically rigorous manipulation trials, and demonstrate cross-platform generality on a Kinova Gen3 robotic arm across two standard benchmarks. Our results show that grounding control learning, VLM-driven abstraction, and automated planning synthesis into a unified pipeline constitutes a practical path toward scalable, data-efficient, expert-free and interpretable neuro-symbolic robotics.

AIMay 4

Are you with me? A Framework for Detecting Mental Model Discrepancies in Task-Based Team Dialogues

Katharine Kowalyshyn, Matthias Scheutz

Humans typically use natural language to update teammates on task states. Since not all updates are communicated, discrepancies arise between the team members' mental models that negatively affect overall team performance. How can we categorize such discrepancies? Do misalignments detected in team dialogue predict future mental model misalignments? Traditional shared mental model (SMM) assessment methods rely on retrospective expert coding that cannot capture real-time coordination dynamics. We propose a framework to identify and categorize four types of mental model discrepancies: unsupported beliefs, false beliefs, belief contradictions, and omissions, all of which can naturally emerge in team dialogues. Using dialogues from twenty dyad teams performing collaborative object identification tasks across four sequential levels, we demonstrate that these discrepancy patterns contain predictive signals. Averaging historical discrepancy counts achieves meaningful prediction accuracy using uniform weighting as an exploratory baseline, with differential predictability across discrepancy types.

ROFeb 6, 2025

Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture

Hong Lu, Hengxu Li, Prithviraj Singh Shahani et al.

Vision-language-action (VLA) models hold promise as generalist robotics solutions by translating visual and linguistic inputs into robot actions, yet they lack reliability due to their black-box nature and sensitivity to environmental changes. In contrast, cognitive architectures (CA) excel in symbolic reasoning and state monitoring but are constrained by rigid predefined execution. This work bridges these approaches by probing OpenVLA's hidden layers to uncover symbolic representations of object properties, relations, and action states, enabling integration with a CA for enhanced interpretability and robustness. Through experiments on LIBERO-spatial pick-and-place tasks, we analyze the encoding of symbolic states across different layers of OpenVLA's Llama backbone. Our probing results show consistently high accuracies (> 0.90) for both object and action states across most layers, though contrary to our hypotheses, we did not observe the expected pattern of object states being encoded earlier than action states. We demonstrate an integrated DIARC-OpenVLA system that leverages these symbolic representations for real-time state monitoring, laying the foundation for more interpretable and reliable robotic manipulation.

AIJan 7, 2024

NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds

Shivam Goel, Yichen Wei, Panagiotis Lymperopoulos et al.

As AI agents leave the lab and venture into the real world as autonomous vehicles, delivery robots, and cooking robots, it is increasingly necessary to design and comprehensively evaluate algorithms that tackle the ``open-world''. To this end, we introduce NovelGym, a flexible and adaptable ecosystem designed to simulate gridworld environments, serving as a robust platform for benchmarking reinforcement learning (RL) and hybrid planning and learning agents in open-world contexts. The modular architecture of NovelGym facilitates rapid creation and modification of task environments, including multi-agent scenarios, with multiple environment transformations, thus providing a dynamic testbed for researchers to develop open-world AI agents.

ROMar 17, 2025

FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation

Shijie Fang, Wenchang Gao, Shivam Goel et al.

Learning to manipulate objects efficiently, particularly those involving sustained contact (e.g., pushing, sliding) and articulated parts (e.g., drawers, doors), presents significant challenges. Traditional methods, such as robot-centric reinforcement learning (RL), imitation learning, and hybrid techniques, require massive training and often struggle to generalize across different objects and robot platforms. We propose a novel framework for learning object-centric manipulation policies in force space, decoupling the robot from the object. By directly applying forces to selected regions of the object, our method simplifies the action space, reduces unnecessary exploration, and decreases simulation overhead. This approach, trained in simulation on a small set of representative objects, captures object dynamics -- such as joint configurations -- allowing policies to generalize effectively to new, unseen objects. Decoupling these policies from robot-specific dynamics enables direct transfer to different robotic platforms (e.g., Kinova, Panda, UR5) without retraining. Our evaluations demonstrate that the method significantly outperforms baselines, achieving over an order of magnitude improvement in training efficiency compared to other state-of-the-art methods. Additionally, operating in force space enhances policy transferability across diverse robot platforms and object types. We further showcase the applicability of our method in a real-world robotic setting. For supplementary materials and videos, please visit: https://tufts-ai-robotics-group.github.io/FLEX/

ROMar 6, 2025

Curiosity-Driven Imagination: Discovering Plan Operators and Learning Associated Policies for Open-World Adaptation

Pierrick Lorang, Hong Lu, Matthias Scheutz

Adapting quickly to dynamic, uncertain environments-often called "open worlds"-remains a major challenge in robotics. Traditional Task and Motion Planning (TAMP) approaches struggle to cope with unforeseen changes, are data-inefficient when adapting, and do not leverage world models during learning. We address this issue with a hybrid planning and learning system that integrates two models: a low level neural network based model that learns stochastic transitions and drives exploration via an Intrinsic Curiosity Module (ICM), and a high level symbolic planning model that captures abstract transitions using operators, enabling the agent to plan in an "imaginary" space and generate reward machines. Our evaluation in a robotic manipulation domain with sequential novelty injections demonstrates that our approach converges faster and outperforms state-of-the-art hybrid methods.

CLSep 2, 2025

LLMs and their Limited Theory of Mind: Evaluating Mental State Annotations in Situated Dialogue

Katharine Kowalyshyn, Matthias Scheutz

What if large language models could not only infer human mindsets but also expose every blind spot in team dialogue such as discrepancies in the team members' joint understanding? We present a novel, two-step framework that leverages large language models (LLMs) both as human-style annotators of team dialogues to track the team's shared mental models (SMMs) and as automated discrepancy detectors among individuals' mental states. In the first step, an LLM generates annotations by identifying SMM elements within task-oriented dialogues from the Cooperative Remote Search Task (CReST) corpus. Then, a secondary LLM compares these LLM-derived annotations and human annotations against gold-standard labels to detect and characterize divergences. We define an SMM coherence evaluation framework for this use case and apply it to six CReST dialogues, ultimately producing: (1) a dataset of human and LLM annotations; (2) a reproducible evaluation framework for SMM coherence; and (3) an empirical assessment of LLM-based discrepancy detection. Our results reveal that, although LLMs exhibit apparent coherence on straightforward natural-language annotation tasks, they systematically err in scenarios requiring spatial reasoning or disambiguation of prosodic cues.

CYAug 14, 2025

Are AI Machines Making Humans Obsolete?

Matthias Scheutz

This chapter starts with a sketch of how we got to "generative AI" (GenAI) and a brief summary of the various impacts it had so far. It then discusses some of the opportunities of GenAI, followed by the challenges and dangers, including dystopian outcomes resulting from using uncontrolled machine learning and our failures to understand the results. It concludes with some suggestions for how to control GenAI and address its dangers.

LGJan 25, 2022

Conditional entropy minimization principle for learning domain invariant representation features

Thuan Nguyen, Boyang Lyu, Prakash Ishwar et al.

Invariance-principle-based methods such as Invariant Risk Minimization (IRM), have recently emerged as promising approaches for Domain Generalization (DG). Despite promising theory, such approaches fail in common classification tasks due to the mixing of true invariant features and spurious invariant features. To address this, we propose a framework based on the conditional entropy minimization (CEM) principle to filter-out the spurious invariant features leading to a new algorithm with a better generalization capability. We show that our proposed approach is closely related to the well-known Information Bottleneck (IB) framework and prove that under certain assumptions, entropy minimization can exactly recover the true invariant features. Our approach provides competitive classification accuracy compared to recent theoretically-principled state-of-the-art alternatives across several DG datasets.

CLOct 12, 2021

Decision-Theoretic Question Generation for Situated Reference Resolution: An Empirical Study and Computational Model

Felix Gervits, Gordon Briggs, Antonio Roque et al.

Dialogue agents that interact with humans in situated environments need to manage referential ambiguity across multiple modalities and ask for help as needed. However, it is not clear what kinds of questions such agents should ask nor how the answers to such questions can be used to resolve ambiguity. To address this, we analyzed dialogue data from an interactive study in which participants controlled a virtual robot tasked with organizing a set of tools while engaging in dialogue with a live, remote experimenter. We discovered a number of novel results, including the distribution of question types used to resolve ambiguity and the influence of dialogue-level factors on the reference resolution process. Based on these empirical findings we: (1) developed a computational model for clarification requests using a decision network with an entropy-based utility assignment method that operates across modalities, (2) evaluated the model, showing that it outperforms a slot-filling baseline in environments of varying ambiguity, and (3) interpreted the results to offer insight into the ways that agents can ask questions to facilitate situated reference resolution.

LGSep 4, 2021

Barycentric-alignment and reconstruction loss minimization for domain generalization

Boyang Lyu, Thuan Nguyen, Prakash Ishwar et al.

This paper advances the theory and practice of Domain Generalization (DG) in machine learning. We consider the typical DG setting where the hypothesis is composed of a representation mapping followed by a labeling function. Within this setting, the majority of popular DG methods aim to jointly learn the representation and the labeling functions by minimizing a well-known upper bound for the classification risk in the unseen domain. In practice, however, methods based on this theoretical upper bound ignore a term that cannot be directly optimized due to its dual dependence on both the representation mapping and the unknown optimal labeling function in the unseen domain. To bridge this gap between theory and practice, we introduce a new upper bound that is free of terms having such dual dependence, resulting in a fully optimizable risk upper bound for the unseen domain. Our derivation leverages classical and recent transport inequalities that link optimal transport metrics with information-theoretic measures. Compared to previous bounds, our bound introduces two new terms: (i) the Wasserstein-2 barycenter term that aligns distributions between domains, and (ii) the reconstruction loss term that assesses the quality of representation in reconstructing the original data. Based on this new upper bound, we propose a novel DG algorithm named Wasserstein Barycenter Auto-Encoder (WBAE) that simultaneously minimizes the classification loss, the barycenter loss, and the reconstruction loss. Numerical results demonstrate that the proposed method outperforms current state-of-the-art DG algorithms on several datasets.

AIJul 9, 2021

Integrating Planning, Execution and Monitoring in the presence of Open World Novelties: Case Study of an Open World Monopoly Solver

Sriram Gopalakrishnan, Utkarsh Soni, Tung Thai et al.

The game of monopoly is an adversarial multi-agent domain where there is no fixed goal other than to be the last player solvent, There are useful subgoals like monopolizing sets of properties, and developing them. There is also a lot of randomness from dice rolls, card-draws, and adversaries' strategies. This unpredictability is made worse when unknown novelties are added during gameplay. Given these challenges, Monopoly was one of the test beds chosen for the DARPA-SAILON program which aims to create agents that can detect and accommodate novelties. To handle the game complexities, we developed an agent that eschews complete plans, and adapts it's policy online as the game evolves. In the most recent independent evaluation in the SAILON program, our agent was the best performing agent on most measures. We herein present our approach and results.

CLJun 11, 2021

How Should Agents Ask Questions For Situated Learning? An Annotated Dialogue Corpus

Felix Gervits, Antonio Roque, Gordon Briggs et al.

Intelligent agents that are confronted with novel concepts in situated environments will need to ask their human teammates questions to learn about the physical world. To better understand this problem, we need data about asking questions in situated task-based interactions. To this end, we present the Human-Robot Dialogue Learning (HuRDL) Corpus - a novel dialogue corpus collected in an online interactive virtual environment in which human participants play the role of a robot performing a collaborative tool-organization task. We describe the corpus data and a corresponding annotation scheme to offer insight into the form and content of questions that humans ask to facilitate learning in a situated environment. We provide the corpus as an empirically-grounded resource for improving question generation in situated intelligent agents.

HCApr 23, 2021

Can You Trust Your Trust Measure?

Meia Chita-Tegmark, Theresa Law, Nicholas Rabb et al.

Trust in human-robot interactions (HRI) is measured in two main ways: through subjective questionnaires and through behavioral tasks. To optimize measurements of trust through questionnaires, the field of HRI faces two challenges: the development of standardized measures that apply to a variety of robots with different capabilities, and the exploration of social and relational dimensions of trust in robots (e.g., benevolence). In this paper we look at how different trust questionnaires fare given these challenges that pull in different directions (being general vs. being exploratory) by studying whether people think the items in these questionnaires are applicable to different kinds of robots and interactions. In Study 1 we show that after being presented with a robot (non-humanoid) and an interaction scenario (fire evacuation), participants rated multiple questionnaire items such as "This robot is principled" as "Non-applicable to robots in general" or "Non-applicable to this robot". In Study 2 we show that the frequency of these ratings change (indeed, even for items rated as N/A to robots in general) when a new scenario is presented (game playing with a humanoid robot). Finally, while overall trust scores remained robust to N/A ratings, our results revealed potential fallacies in the way these scores are commonly interpreted. We conclude with recommendations for the development, use and results-reporting of trust questionnaires for future studies, as well as theoretical implications for the field of HRI.

ROApr 7, 2021

Robot Development and Path Planning for Indoor Ultraviolet Light Disinfection

Jonathan Conroy, Christopher Thierauf, Parker Rule et al.

Regular irradiation of indoor environments with ultraviolet C (UVC) light has become a regular task for many indoor settings as a result of COVID-19, but current robotic systems attempting to automate it suffer from high costs and inefficient irradiation. In this paper, we propose a purpose-made inexpensive robotic platform with off-the-shelf components and standard navigation software that, with a novel algorithm for finding optimal irradiation locations, addresses both shortcomings to offer affordable and efficient solutions for UVC irradiation. We demonstrate in simulations the efficacy of the algorithm and show a prototypical run of the autonomous integrated robotic system in an indoor environment. In our sample instances, our proposed algorithm reduces the time needed by roughly 30\% while it increases the coverage by a factor of 35\% (when compared to the best possible placement of a static light).

AIDec 24, 2020

SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning

Vasanth Sarathy, Daniel Kasenberg, Shivam Goel et al.

Symbolic planning models allow decision-making agents to sequence actions in arbitrary ways to achieve a variety of goals in dynamic domains. However, they are typically handcrafted and tend to require precise formulations that are not robust to human error. Reinforcement learning (RL) approaches do not require such models, and instead learn domain dynamics by exploring the environment and collecting rewards. However, RL approaches tend to require millions of episodes of experience and often learn policies that are not easily transferable to other tasks. In this paper, we address one aspect of the open problem of integrating these approaches: how can decision-making agents resolve discrepancies in their symbolic planning models while attempting to accomplish goals? We propose an integrated framework named SPOTTER that uses RL to augment and support ("spot") a planning agent by discovering new operators needed by the agent to accomplish goals that are initially unreachable for the agent. SPOTTER outperforms pure-RL approaches while also discovering transferable symbolic knowledge and does not require supervision, successful plan traces or any a priori knowledge about the missing planning operator.

ROMay 4, 2020

"Can you do this?" Self-Assessment Dialogues with Autonomous Robots Before, During, and After a Mission

Tyler Frasca, Evan Krause, Ravenna Thielstrom et al.

Autonomous robots with sophisticated capabilities can make it difficult for human instructors to assess its capabilities and proficiencies. Therefore, it is important future robots have the ability to: introspect on their capabilities and assess their task performance. Introspection allows the robot to determine what it can accomplish and self-assessment allows the robot estimate the likelihood it will accomplish at given task. We introduce a general framework for introspection and self-assessment that enables robots to have task and performance-based dialogues before, during, and after a mission. We then realize aspects of the framework in the cognitive robotic DIARC architecture, and finally show a proof-of-concept demonstration on a Nao robot showing its self-assessment capabilities before, during, and after an instructed task.

HCFeb 8, 2020

Assistive robots for the social management of health: a framework for robot design and human-robot interaction research

Meia Chita-Tegmark, Matthias Scheutz

There is a close connection between health and the quality of one's social life. Strong social bonds are essential for health and wellbeing, but often health conditions can detrimentally affect a person's ability to interact with others. This can become a vicious cycle resulting in further decline in health. For this reason, the social management of health is an important aspect of healthcare. We propose that socially assistive robots (SARs) could help people with health conditions maintain positive social lives by supporting them in social interactions. This paper makes three contributions, as detailed below. We develop a framework of social mediation functions that robots could perform, motivated by the special social needs that people with health conditions have. In this framework we identify five types of functions that SARs could perform: a) changing how the person is perceived, b) enhancing the social behavior of the person, c) modifying the social behavior of others, d) providing structure for interactions, and e) changing how the person feels. We thematically organize and review the existing literature on robots supporting human-human interactions, in both clinical and non-clinical settings, and explain how the findings and design ideas from these studies can be applied to the functions identified in the framework. Finally, we point out and discuss challenges in designing SARs for supporting social interactions, and highlight opportunities for future robot design and HRI research on the mediator role of robots.

CLNov 1, 2019

Engaging in Dialogue about an Agent's Norms and Behaviors

Daniel Kasenberg, Antonio Roque, Ravenna Thielstrom et al.

We present a set of capabilities allowing an agent planning with moral and social norms represented in temporal logic to respond to queries about its norms and behaviors in natural language, and for the human user to add and remove norms directly in natural language. The user may also pose hypothetical modifications to the agent's norms and inquire about their effects.

CLNov 1, 2019

Generating Justifications for Norm-Related Agent Decisions

Daniel Kasenberg, Antonio Roque, Ravenna Thielstrom et al.

We present an approach to generating natural language justifications of decisions derived from norm-based reasoning. Assuming an agent which maximally satisfies a set of rules specified in an object-oriented temporal logic, the user can ask factual questions (about the agent's rules, actions, and the extent to which the agent violated the rules) as well as "why" questions that require the agent comparing actual behavior to counterfactual trajectories with respect to these rules. To produce natural-sounding explanations, we focus on the subproblem of producing natural language clauses from statements in a fragment of temporal logic, and then describe how to embed these clauses into explanatory sentences. We use a human judgment evaluation on a testbed task to compare our approach to variants in terms of intelligibility, mental model and perceived trust.

ROFeb 4, 2019

When Exceptions are the Norm: Exploring the Role of Consent in HRI

Vasanth Sarathy, Thomas Arnold, Matthias Scheutz

HRI researchers have made major strides in developing robotic architectures that are capable of reading a limited set of social cues and producing behaviors that enhance their likeability and feeling of comfort amongst humans. However, the cues in these models are fairly direct and the interactions largely dyadic. To capture the normative qualities of interaction more robustly, we propose consent as a distinct, critical area for HRI research. Convening important insights in existing HRI work around topics like touch, proxemics, gaze, and moral norms, the notion of consent reveals key expectations that can shape how a robot acts in social space. By sorting various kinds of consent through social and legal doctrine, we delineate empirical and technical questions to meet consent challenges faced in major application domains and robotic roles. Attention to consent could show, for example, how extraordinary, norm-violating actions can be justified by agents and accepted by those around them. We argue that operationalizing ideas from legal scholarship can better guide how robotic systems might cultivate and sustain proper forms of consent.

RONov 26, 2018

Augmenting Robot Knowledge Consultants with Distributed Short Term Memory

Tom Williams, Ravenna Thielstrom, Evan Krause et al.

Human-robot communication in situated environments involves a complex interplay between knowledge representations across a wide variety of modalities. Crucially, linguistic information must be associated with representations of objects, locations, people, and goals, which may be represented in very different ways. In previous work, we developed a Consultant Framework that facilitates modality-agnostic access to information distributed across a set of heterogeneously represented knowledge sources. In this work, we draw inspiration from cognitive science to augment these distributed knowledge sources with Short Term Memory Buffers to create an STM-augmented algorithm for referring expression generation. We then discuss the potential performance benefits of this approach and insights from cognitive science that may inform future refinements in the design of our approach.

AIJul 6, 2018

Quasi-Dilemmas for Artificial Moral Agents

Daniel Kasenberg, Vasanth Sarathy, Thomas Arnold et al.

In this paper we describe moral quasi-dilemmas (MQDs): situations similar to moral dilemmas, but in which an agent is unsure whether exploring the plan space or the world may reveal a course of action that satisfies all moral requirements. We argue that artificial moral agents (AMAs) should be built to handle MQDs (in particular, by exploring the plan space rather than immediately accepting the inevitability of the moral dilemma), and that MQDs may be useful for evaluating AMA architectures.

SYOct 28, 2017

Interpretable Apprenticeship Learning with Temporal Logic Specifications

Daniel Kasenberg, Matthias Scheutz

Recent work has addressed using formulas in linear temporal logic (LTL) as specifications for agents planning in Markov Decision Processes (MDPs). We consider the inverse problem: inferring an LTL specification from demonstrated behavior trajectories in MDPs. We formulate this as a multiobjective optimization problem, and describe state-based ("what actually happened") and action-based ("what the agent expected to happen") objective functions based on a notion of "violation cost". We demonstrate the efficacy of the approach by employing genetic programming to solve this problem in two simple domains.

AIJul 15, 2017

AI Challenges in Human-Robot Cognitive Teaming

Tathagata Chakraborti, Subbarao Kambhampati, Matthias Scheutz et al.

Among the many anticipated roles for robots in the future is that of being a human teammate. Aside from all the technological hurdles that have to be overcome with respect to hardware and control to make robots fit to work with humans, the added complication here is that humans have many conscious and subconscious expectations of their teammates - indeed, we argue that teaming is mostly a cognitive rather than physical coordination activity. This introduces new challenges for the AI and robotics community and requires fundamental changes to the traditional approach to the design of autonomy. With this in mind, we propose an update to the classical view of the intelligent agent architecture, highlighting the requirements for mental modeling of the human in the deliberative process of the autonomous agent. In this article, we outline briefly the recent efforts of ours, and others in the community, towards developing cognitive teammates along these guidelines.

AIApr 26, 2017

The MacGyver Test - A Framework for Evaluating Machine Resourcefulness and Creative Problem Solving

Vasanth Sarathy, Matthias Scheutz

Current measures of machine intelligence are either difficult to evaluate or lack the ability to test a robot's problem-solving capacity in open worlds. We propose a novel evaluation framework based on the formal notion of MacGyver Test which provides a practical way for assessing the resilience and resourcefulness of artificial agents.

ROFeb 11, 2016

Enabling Basic Normative HRI in a Cognitive Robotic Architecture

Vasanth Sarathy, Jason R. Wilson, Thomas Arnold et al.

Collaborative human activities are grounded in social and moral norms, which humans consciously and subconsciously use to guide and constrain their decision-making and behavior, thereby strengthening their interactions and preventing emotional and physical harm. This type of norm-based processing is also critical for robots in many human-robot interaction scenarios (e.g., when helping elderly and disabled persons in assisted living facilities, or assisting humans in assembly tasks in factories or even the space station). In this position paper, we will briefly describe how several components in an integrated cognitive architecture can be used to implement processes that are required for normative human-robot interactions, especially in collaborative tasks where actions and situations could potentially be perceived as threatening and thus need a change in course of action to mitigate the perceived threats.