Bilge Mutlu

RO
h-index23
26papers
586citations
Novelty39%
AI Score53

26 Papers

HCMay 27
Robo-Blocks: Generative Scaffolding in End-User Design and Programming of Social Robots

Arissa J. Sato, Callie Y. Kim, Nathan Thomas White et al.

Programming social robots is challenging for novice robot programmers due to required expertise in planning, interaction design, and programming. While large language models (LLMs) hold significant promise through code generation from natural-language descriptions, they can obscure critical elements of programming and supplant designer intent, eventually resulting in over-reliance instead of developing programming skills. In this paper, we explore how LLM-based social-robot-programming tools can support novice robot programmers through a Research through Design (RtD) process. We designed and prototyped Robo-Blocks, a block-based programming environment that leverages LLMs to offer novice robot programmers generative scaffolding through structured narratives that connect high-level ideas to executable robot behaviors. Through deployment with novices, we discovered emerging user personas and usage patterns for generative scaffolding and showed how this scaffolding shapes end-user design and programming strategies. We present design insights for the effective use of generative scaffolding and its integration into the practice of social-robot programming.

AIApr 13
The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break

Xinyu Jessica Wang, Haoyue Bai, Yiyou Sun et al.

Large language model (LLM) agents perform strongly on short- and mid-horizon tasks, but often break down on long-horizon tasks that require extended, interdependent action sequences. Despite rapid progress in agentic systems, these long-horizon failures remain poorly characterized, hindering principled diagnosis and comparison across domains. To address this gap, we introduce HORIZON, an initial cross-domain diagnostic benchmark for systematically constructing tasks and analyzing long-horizon failure behaviors in LLM-based agents. Using HORIZON, we evaluate state-of-the-art (SOTA) agents from multiple model families (GPT-5 variants and Claude models), collecting 3100+ trajectories across four representative agentic domains to study horizon-dependent degradation patterns. We further propose a trajectory-grounded LLM-as-a-Judge pipeline for scalable and reproducible failure attribution, and validate it with human annotation on trajectories, achieving strong agreement (inter-annotator κ=0.61; human-judge κ=0.84). Our findings offer an initial methodological step toward systematic, cross-domain analysis of long-horizon agent failures and offer practical guidance for building more reliable long-horizon agents. We release our project website at \href{https://xwang2775.github.io/horizon-leaderboard/}{HORIZON Leaderboard} and welcome contributions from the community.

HCMay 7
LearnMate^2: Design and Evaluation of an LLM-powered Personalized and Adaptive Support System for Online Learning

Xinyu Jessica Wang, Christine P. Lee, Bilge Mutlu

Personalization is crucial for effective learning, yet online learning, designed for widespread availability and open access, lacks personalized guidance. Recent advancements in large language models (LLMs) offer opportunities to bridge this gap. We explore how LLM-driven tools may be designed to support personalized and adaptive learning and examine how they shape user experience and learning outcomes. We iteratively designed \tool{} to support online learning by providing personalized study plans, real-time contextual assistance, and adaptive learning activities. A preliminary study ($n=24$) assessed the effectiveness and usability of \tool{} and informed refinements in our system, which we then evaluated ($n = 16$) against a combination of a state-of-the-art online learning platform and an LLM for learning support. Results indicate that \tool{} advances AI pedagogy by improving both learning outcomes and user experience compared to existing online learning and support tools. This work advances our understanding of the design space of personalized, AI-driven educational tools and their potential impact on user experience.

AIMay 4
U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning

Christine P Lee, Xinyu Jessica Wang, Aws Albarghouthi et al.

LLMs are increasingly used for end-user task planning, yet their black-box nature limits users' ability to ensure reliability and control. While recent systems incorporate verification techniques, it remains unclear how users can effectively apply such rigid constraints to represent intent or adapt to real-world variability. For example, prior work finds that hard-only constraints are too rigid, and numeric flexibility weights confuse users. We investigate how interaction workflows can better support users in applying constraints to guide LLM-generated plans, examining whether abstracting strictness into high-level types (i.e., hard and soft) paired with distinct verification mechanisms helps users more reliably express and align intent. We present U-Define, a system that lets users define constraints in natural language and categorize them as either hard rules that must not be violated or soft preferences that allow flexibility. U-Define verifies these types through complementary methods: formal model checking for hard constraints and LLM-as-judge evaluation for soft ones. Through a technical evaluation and user studies with general and expert participants, we find that user-defined constraint types improve perceived usefulness, performance, and satisfaction while maintaining usability. These findings provide insights for designing flexible yet reliable constraint-based workflows.

AIMay 4
Making the Invisible Visible: Understanding the Mismatch Between Organizational Goals and Worker Experiences in AI Adoption

Christine P. Lee, Min Kyung Lee, Bilge Mutlu

While AI is often introduced into organizations to drive innovation and efficiency, many adoption efforts fail as workers resist and struggle to integrate these systems. These failures point to a deeper issue: workers, the very people expected to collaborate with AI, are often invisible in decisions about how AI is designed and used. Drawing on interviews with professionals who interact with AI systems daily in healthcare, finance, and management, we examine the disconnect between organizational expectations and worker experiences. We identify key barriers, including poor usability and interoperability, misaligned expectations, limited control, and insufficient communication. These challenges highlight a gap between how organizations implement AI and the evolving worker needs, tasks, and workflows that it fails to support. We argue that successful adoption requires recognizing workers as central to AI integration and propose adaptation strategies at the individual, task, and organizational levels to better align AI systems with real-world practices.

HCFeb 25, 2025
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning

Christine Lee, David Porfirio, Xinyu Jessica Wang et al.

Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features -- a rule translator, flexibility sliders, and a model checker -- that engage users in the verification process. Through a user study (n=12), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

ROApr 27
Supporting Family-School Partnerships with Robot-Facilitated Home-Based Activities

Michael F Xu, Qiyao Yang, Heather Kirkorian et al.

Family-school partnerships (FSP) are critical to children's development, yet families often face barriers such as time constraints, fragmented communication, and limited opportunities for meaningful engagement. As a step toward facilitating broader family-school partnerships, we explore a novel approach that integrates a social robot into family settings, specifically supporting home-based activities. Through interviews and co-design sessions, we designed and developed a robotic system informed by both parents and children, that supported, among other interactions, family communication about school topics. We evaluated the robot in a week-long, in-home study with 10 families. Our findings show how families integrated the robot into daily life, how parental facilitation styles shaped use, and how families perceived both the helpfulness and challenges of the robot. We contribute empirical insights, a modular system, and design implications for family- and child-robot interactions. We discuss ethical and privacy considerations, and broaden the design space for technologies supporting family-school partnerships.

ROApr 27
Designing Robots to Support Parent-Child Connections: Opportunities Through Robot-Mediated Communication

Michael F Xu, Bengisu Cagiltay, Yaxin Hu et al.

The sense of family connectedness may support positive outcomes including individual well-being, resilience, and healthy family functioning. However, as technologies advance, they often replace human-human interactions instead of nurturing them. In this work, we investigate how robot-facilitated communication tools might instead create new opportunities for family connection. We conducted two studies with families with children aged 5-12. We first explored the design space through in-home technology probe sessions with six families. These probes inspired us to explore two key interaction design dimensions: the robot's behavior strategy (passive, reactive, proactive) and the mode of communication (synchronous, asynchronous). We then conducted a laboratory study with 20 families to examine how the two dimensions shaped parent-child interaction and connection. Our findings characterize how parents and children appropriated robot-mediated exchanges, the tensions they experienced around initiative, timing, and privacy, and the opportunities they envisioned for supporting everyday connectedness.

HCMar 17, 2025
MAP: Multi-user Personalization with Collaborative LLM-powered Agents

Christine Lee, Jihye Choi, Bilge Mutlu

The widespread adoption of Large Language Models (LLMs) and LLM-powered agents in multi-user settings underscores the need for reliable, usable methods to accommodate diverse preferences and resolve conflicting directives. Drawing on conflict resolution theory, we introduce a user-centered workflow for multi-user personalization comprising three stages: Reflection, Analysis, and Feedback. We then present MAP -- a \textbf{M}ulti-\textbf{A}gent system for multi-user \textbf{P}ersonalization -- to operationalize this workflow. By delegating subtasks to specialized agents, MAP (1) retrieves and reflects on relevant user information, while enhancing reliability through agent-to-agent interactions, (2) provides detailed analysis for improved transparency and usability, and (3) integrates user feedback to iteratively refine results. Our user study findings (n=12) highlight MAP's effectiveness and usability for conflict resolution while emphasizing the importance of user involvement in resolution verification and failure management. This work highlights the potential of multi-agent systems to implement user-centered, multi-user personalization workflows and concludes by offering insights for personalization in multi-user contexts.

ROJul 22, 2025
Benchmarking LLM Privacy Recognition for Social Robot Decision Making

Dakota Sullivan, Shirley Zhang, Jennica Li et al.

While robots have previously utilized rule-based systems or probabilistic models for user interaction, the rapid evolution of large language models (LLMs) presents new opportunities to develop LLM-powered robots for enhanced human-robot interaction (HRI). To fully realize these capabilities, however, robots need to collect data such as audio, fine-grained images, video, and locations. As a result, LLMs often process sensitive personal information, particularly within private environments, such as homes. Given the tension between utility and privacy risks, evaluating how current LLMs manage sensitive data is critical. Specifically, we aim to explore the extent to which out-of-the-box LLMs are privacy-aware in the context of household robots. In this work, we present a set of privacy-relevant scenarios developed using the Contextual Integrity (CI) framework. We first surveyed users' privacy preferences regarding in-home robot behaviors and then examined how their privacy orientations affected their choices of these behaviors (N = 450). We then provided the same set of scenarios and questions to state-of-the-art LLMs (N = 10) and found that the agreement between humans and LLMs was generally low. To further investigate the capabilities of LLMs as potential privacy controllers, we implemented four additional prompting strategies and compared their results. We discuss the performance of the evaluated models as well as the implications and potential of AI privacy awareness in human-robot interaction.

ROFeb 17, 2022
The Unboxing Experience: Exploration and Design of Initial Interactions Between Children and Social Robots

Christine P Lee, Bengisu Cagiltay, Bilge Mutlu

Social robots are increasingly introduced into children's lives as educational and social companions, yet little is known about how these products might best be introduced to their environments. The emergence of the "unboxing" phenomenon in media suggests that introduction is key to technology adoption where initial impressions are made. To better understand this phenomenon toward designing a positive unboxing experience in the context of social robots for children, we conducted three field studies with families of children aged 8 to 13: (1) an exploratory free-play activity ($n=12$); (2) a co-design session ($n=11$) that informed the development of a prototype box and a curated unboxing experience; and (3) a user study ($n=9$) that evaluated children's experiences. Our findings suggest the unboxing experience of social robots can be improved through the design of a creative aesthetic experience that engages the child socially to guide initial interactions and foster a positive child-robot relationship.

ROJan 8, 2022
CONFIDANT: A Privacy Controller for Social Robots

Brian Tang, Dakota Sullivan, Bengisu Cagiltay et al.

As social robots become increasingly prevalent in day-to-day environments, they will participate in conversations and appropriately manage the information shared with them. However, little is known about how robots might appropriately discern the sensitivity of information, which has major implications for human-robot trust. As a first step to address a part of this issue, we designed a privacy controller, CONFIDANT, for conversational social robots, capable of using contextual metadata (e.g., sentiment, relationships, topic) from conversations to model privacy boundaries. Afterwards, we conducted two crowdsourced user studies. The first study (n=174) focused on whether a variety of human-human interaction scenarios were perceived as either private/sensitive or non-private/non-sensitive. The findings from our first study were used to generate association rules. Our second study (n=95) evaluated the effectiveness and accuracy of the privacy controller in human-robot interaction scenarios by comparing a robot that used our privacy controller against a baseline robot with no privacy controls. Our results demonstrate that the robot with the privacy controller outperforms the robot without the privacy controller in privacy-awareness, trustworthiness, and social-awareness. We conclude that the integration of privacy controllers in authentic human-robot conversations can allow for more trustworthy robots. This initial privacy controller will serve as a foundation for more complex solutions.

HCOct 7, 2021
Appearance

Rachel McDonnell, Bilge Mutlu

Socially interactive agents (SIAs) are no longer mere visions for future user interfaces, as 20 years of research and technology development has enabled the use of virtual and physical agents in day-to-day interfaces and environments. This chapter of the ACM "The Handbook on Socially Interactive Agents" reviews research on and technologies involving socially interactive agents, including virtually embodied agents and physically embodied robots, focusing particularly on the appearance of socially interactive agents. It covers the history of the development of these technologies; outlines the design space for the appearance of agents, including what appearance comprises, modalities in which agents are presented, and how agents are constructed; and the features that agents use to support social interaction, including facial and bodily features, those that express demographic characteristics, and issues surrounding realism, appeal, and the uncanny valley. The chapter concludes with a brief discussion of open questions surrounding the appearance of socially interactive agents.

ROSep 6, 2021
Task-Level Authoring for Remote Robot Teleoperation

Emmanuel Senft, Michael Hagenow, Kevin Welsh et al.

Remote teleoperation of robots can broaden the reach of domain specialists across a wide range of industries such as home maintenance, health care, light manufacturing, and construction. However, current direct control methods are impractical, and existing tools for programming robot remotely have focused on users with significant robotic experience. Extending robot remote programming to end users, i.e., users who are experts in a domain but novices in robotics, requires tools that balance the rich features necessary for complex teleoperation tasks with ease of use. The primary challenge to usability is that novice users are unable to specify complete and robust task plans to allow a robot to perform duties autonomously, particularly in highly variable environments. Our solution is to allow operators to specify shorter sequences of high-level commands, which we call task-level authoring, to create periods of variable robot autonomy. This approach allows inexperienced users to create robot behaviors in uncertain environments by interleaving exploration, specification of behaviors, and execution as separate steps. End users are able to break down the specification of tasks and adapt to the current needs of the interaction and environments, combining the reactivity of direct control to asynchronous operation. In this paper, we describe a prototype system contextualized in light manufacturing and its empirical validation in a user study where 18 participants with some programming experience were able to perform a variety of complex telemanipulation tasks with little training. Our results show that our approach allowed users to create flexible periods of autonomy and solve rich manipulation tasks. Furthermore, participants significantly preferred our system over comparative more direct interfaces, demonstrating the potential of our approach.

ROAug 10, 2021
Recognizing Orientation Slip in Human Demonstrations

Michael Hagenow, Bolun Zhang, Bilge Mutlu et al.

Manipulations of a constrained object often use a non-rigid grasp that allows the object to rotate relative to the end effector. This orientation slip strategy is often present in natural human demonstrations, yet it is generally overlooked in methods to identify constraints from such demonstrations. In this paper, we present a method to model and recognize prehensile orientation slip in human demonstrations of constrained interactions. Using only observations of an end effector, we can detect the type of constraint, parameters of the constraint, and orientation slip properties. Our method uses a novel hierarchical model selection method that is informed by multiple origins of physics-based evidence. A study with eight participants shows that orientation slip occurs in natural demonstrations and confirms that it can be detected by our method.

ROAug 8, 2021
Situated Live Programming for Human-Robot Collaboration

Emmanuel Senft, Michael Hagenow, Robert Radwin et al.

We present situated live programming for human-robot collaboration, an approach that enables users with limited programming experience to program collaborative applications for human-robot interaction. Allowing end users, such as shop floor workers, to program collaborative robots themselves would make it easy to "retask" robots from one process to another, facilitating their adoption by small and medium enterprises. Our approach builds on the paradigm of trigger-action programming (TAP) by allowing end users to create rich interactions through simple trigger-action pairings. It enables end users to iteratively create, edit, and refine a reactive robot program while executing partial programs. This live programming approach enables the user to utilize the task space and objects by incrementally specifying situated trigger-action pairs, substantially lowering the barrier to entry for programming or reprogramming robots for collaboration. We instantiate situated live programming in an authoring system where users can create trigger-action programs by annotating an augmented video feed from the robot's perspective and assign robot actions to trigger conditions. We evaluated this system in a study where participants (n = 10) developed robot programs for solving collaborative light-manufacturing tasks. Results showed that users with little programming experience were able to program HRC tasks in an interactive fashion and our situated live programming approach further supported individualized strategies and workflows. We conclude by discussing opportunities and limitations of the proposed approach, our system implementation, and our study and discuss a roadmap for expanding this approach to a broader range of tasks and applications.

ROJul 10, 2021
Informing Real-time Corrections in Corrective Shared Autonomy Through Expert Demonstrations

Michael Hagenow, Emmanuel Senft, Robert Radwin et al.

Corrective Shared Autonomy is a method where human corrections are layered on top of an otherwise autonomous robot behavior. Specifically, a Corrective Shared Autonomy system leverages an external controller to allow corrections across a range of task variables (e.g., spinning speed of a tool, applied force, path) to address the specific needs of a task. However, this inherent flexibility makes the choice of what corrections to allow at any given instant difficult to determine. This choice of corrections includes determining appropriate robot state variables, scaling for these variables, and a way to allow a user to specify the corrections in an intuitive manner. This paper enables efficient Corrective Shared Autonomy by providing an automated solution based on Learning from Demonstration to both extract the nominal behavior and address these core problems. Our evaluation shows that this solution enables users to successfully complete a surface cleaning task, identifies different strategies users employed in applying corrections, and points to future improvements for our solution.

ROJun 1, 2021
Strobe: An Acceleration Meta-algorithm for Optimizing Robot Paths using Concurrent Interleaved Sub-Epoch Pods

Daniel Rakita, Bilge Mutlu, Michael Gleicher

In this paper, we present a meta-algorithm intended to accelerate many existing path optimization algorithms. The central idea of our work is to strategically break up a waypoint path into consecutive groupings called "pods," then optimize over various pods concurrently using parallel processing. Each pod is assigned a color, either blue or red, and the path is divided in such a way that adjacent pods of the same color have an appropriate buffer of the opposite color between them, reducing the risk of interference between concurrent computations. We present a path splitting algorithm to create blue and red pod groupings and detail steps for a meta-algorithm that optimizes over these pods in parallel. We assessed how our method works on a testbed of simulated path optimization scenarios using various optimization tasks and characterize how it scales with additional threads. We also compared our meta-algorithm on these tasks to other parallelization schemes. Our results show that our method more effectively utilizes concurrency compared to the alternatives, both in terms of speed and optimization quality.

ROMay 31, 2021
Single-query Path Planning Using Sample-efficient Probability Informed Trees

Daniel Rakita, Bilge Mutlu, Michael Gleicher

In this work, we present a novel sampling-based path planning method, called SPRINT. The method finds solutions for high dimensional path planning problems quickly and robustly. Its efficiency comes from minimizing the number of collision check samples. This reduction in sampling relies on heuristics that predict the likelihood that samples will be useful in the search process. Specifically, heuristics (1) prioritize more promising search regions; (2) cull samples from local minima regions; and (3) steer the search away from previously observed collision states. Empirical evaluations show that our method finds shorter or comparable-length solution paths in significantly less time than commonly used methods. We demonstrate that these performance gains can be largely attributed to our approach to achieve sample efficiency.

ROFeb 25, 2021
CollisionIK: A Per-Instant Pose Optimization Method for Generating Robot Motions with Environment Collision Avoidance

Daniel Rakita, Haochen Shi, Bilge Mutlu et al.

In this work, we present a per-instant pose optimization method that can generate configurations that achieve specified pose or motion objectives as best as possible over a sequence of solutions, while also simultaneously avoiding collisions with static or dynamic obstacles in the environment. We cast our method as a multi-objective, non-linear constrained optimization-based IK problem where each term in the objective function encodes a particular pose objective. We demonstrate how to effectively incorporate environment collision avoidance as a single term in this multi-objective, optimization-based IK structure, and provide solutions for how to spatially represent and organize external environments such that data can be efficiently passed to a real-time, performance-critical optimization loop. We demonstrate the effectiveness of our method by comparing it to various state-of-the-art methods in a testbed of simulation experiments and discuss the implications of our work based on our results.

ROFeb 14, 2021
Corrective Shared Autonomy for Addressing Task Variability

Michael Hagenow, Emmanuel Senft, Robert Radwin et al.

Many tasks, particularly those involving interaction with the environment, are characterized by high variability, making robotic autonomy difficult. One flexible solution is to introduce the input of a human with superior experience and cognitive abilities as part of a shared autonomy policy. However, current methods for shared autonomy are not designed to address the wide range of necessary corrections (e.g., positions, forces, execution rate, etc.) that the user may need to provide to address task variability. In this paper, we present corrective shared autonomy, where users provide corrections to key robot state variables on top of an otherwise autonomous task model. We provide an instantiation of this shared autonomy paradigm and demonstrate its viability and benefits such as low user effort and physical demand via a system-level user study on three tasks involving variability situated in aircraft manufacturing.

RODec 1, 2019
Embodiment in Socially Interactive Robots

Eric Deng, Bilge Mutlu, Maja Mataric

Physical embodiment is a required component for robots that are structurally coupled with their real-world environments. However, most socially interactive robots do not need to physically interact with their environments in order to perform their tasks. When and why should embodied robots be used instead of simpler and cheaper virtual agents? This paper reviews the existing work that explores the role of physical embodiment in socially interactive robots. This class consists of robots that are not only capable of engaging in social interaction with humans, but are using primarily their social capabilities to perform their desired functions. Socially interactive robots provide entertainment, information, and/or assistance; this last category is typically encompassed by socially assistive robotics. In all cases, such robots can achieve their primary functions without performing functional physical work. To comprehensively evaluate the existing body of work on embodiment, we first review work from established related fields including psychology, philosophy, and sociology. We then systematically review 65 studies evaluating aspects of embodiment published from 2003 to 2017 in major peer-reviewed robotics publication venues. We examine relevant aspects of the selected studies, focusing on the embodiments compared, tasks evaluated, social roles of robots, and measurements. We introduce three taxonomies for the types of robot embodiment, robot social roles, and human-robot tasks. These taxonomies are used to deconstruct the design and interaction spaces of socially interactive robots and facilitate analysis and discussion of the reviewed studies. We use this newly-defined methodology to critically discuss existing works, revealing topics within embodiment research for social interaction, assistive robotics, and service robotics.

ROJan 31, 2019
Characterizing Input Methods for Human-to-robot Demonstrations

Pragathi Praveena, Guru Subramani, Bilge Mutlu et al.

Human demonstrations are important in a range of robotics applications, and are created with a variety of input methods. However, the design space for these input methods has not been extensively studied. In this paper, focusing on demonstrations of hand-scale object manipulation tasks to robot arms with two-finger grippers, we identify distinct usage paradigms in robotics that utilize human-to-robot demonstrations, extract abstract features that form a design space for input methods, and characterize existing input methods as well as a novel input method that we introduce, the instrumented tongs. We detail the design specifications for our method and present a user study that compares it against three common input methods: free-hand manipulation, kinesthetic guidance, and teleoperation. Study results show that instrumented tongs provide high quality demonstrations and a positive experience for the demonstrator while offering good correspondence to the target robot.

HCDec 1, 2018
PowerCut and Obfuscator: An Exploration of the Design Space for Privacy-Preserving Interventions for Voice Assistants

Varun Chandrasekaran, Suman Banerjee, Bilge Mutlu et al.

The pervasive use of smart speakers has raised numerous privacy concerns. While work to date provides an understanding of user perceptions of these threats, limited research focuses on how we can mitigate these concerns, either through redesigning the smart speaker or through dedicated privacy-preserving interventions. In this paper, we present the design and prototyping of two privacy-preserving interventions: `Obfuscator' targeted at disabling recording at the microphones, and `PowerCut' targeted at disabling power to the smart speaker. We present our findings from a technology probe study involving 24 households that interacted with our prototypes; the primary objective was to gain a better understanding of the design space for technological interventions that might address these concerns. Our data and findings reveal complex trade-offs among utility, privacy, and usability and stresses the importance of multi-functionality, aesthetics, ease-of-use, and form factor. We discuss the implications of our findings for the development of subsequent interventions and the future design of smart speakers.

RONov 6, 2018
Evaluating Methods for End-User Creation of Robot Task Plans

Chris Paxton, Felix Jonathan, Andrew Hundt et al.

How can we enable users to create effective, perception-driven task plans for collaborative robots? We conducted a 35-person user study with the Behavior Tree-based CoSTAR system to determine which strategies for end user creation of generalizable robot task plans are most usable and effective. CoSTAR allows domain experts to author complex, perceptually grounded task plans for collaborative robots. As a part of CoSTAR's wide range of capabilities, it allows users to specify SmartMoves: abstract goals such as "pick up component A from the right side of the table." Users were asked to perform pick-and-place assembly tasks with either SmartMoves or one of three simpler baseline versions of CoSTAR. Overall, participants found CoSTAR to be highly usable, with an average System Usability Scale score of 73.4 out of 100. SmartMove also helped users perform tasks faster and more effectively; all SmartMove users completed the first two tasks, while not all users completed the tasks using the other strategies. SmartMove users showed better performance for incorporating perception across all three tasks.

ROMar 23, 2017
User Experience of the CoSTAR System for Instruction of Collaborative Robots

Chris Paxton, Felix Jonathan, Andrew Hundt et al.

How can we enable novice users to create effective task plans for collaborative robots? Must there be a tradeoff between generalizability and ease of use? To answer these questions, we conducted a user study with the CoSTAR system, which integrates perception and reasoning into a Behavior Tree-based task plan editor. In our study, we ask novice users to perform simple pick-and-place assembly tasks under varying perception and planning capabilities. Our study shows that users found Behavior Trees to be an effective way of specifying task plans. Furthermore, users were also able to more quickly, effectively, and generally author task plans with the addition of CoSTAR's planning, perception, and reasoning capabilities. Despite these improvements, concepts associated with these capabilities were rated by users as less usable, and our results suggest a direction for further refinement.