Claudia Pérez-D’Arpino

h-index12

7papers

633citations

Novelty47%

AI Score39

Ranked #78,472 of 194,257 authors (top 40%)#2,351 in RO (top 35%)

7 Papers

26.5ROJun 29, 2023

Principles and Guidelines for Evaluating Social Robot Navigation Algorithms

Anthony Francis, Claudia Pérez-D'Arpino, Chengshu Li et al. · cmu, mit

A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agents and their perceptions of the appropriateness of robot behavior. In contrast, clear, repeatable, and accessible benchmarks have accelerated progress in fields like computer vision, natural language processing and traditional robot navigation by enabling researchers to fairly compare algorithms, revealing limitations of existing solutions and illuminating promising new directions. We believe the same approach can benefit social navigation. In this paper, we pave the road towards common, widely accessible, and repeatable benchmarking criteria to evaluate social robot navigation. Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a design of a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.

22.3ROMar 30, 2023

Learning Human-to-Robot Handovers from Point Clouds

Sammy Christen, Wei Yang, Claudia Pérez-D'Arpino et al. · nvidia

We propose the first framework to learn control policies for vision-based human-to-robot handovers, a critical task for human-robot interaction. While research in Embodied AI has made significant progress in training robot agents in simulated environments, interacting with humans remains challenging due to the difficulties of simulating humans. Fortunately, recent research has developed realistic simulated environments for human-to-robot handovers. Leveraging this result, we introduce a method that is trained with a human-in-the-loop via a two-stage teacher-student framework that uses motion and grasp planning, reinforcement learning, and self-supervision. We show significant performance gains over baselines on a simulation benchmark, sim-to-sim transfer and sim-to-real transfer.

37.0AIDec 5, 2020Code

iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes

Bokui Shen, Fei Xia, Chengshu Li et al.

We present iGibson 1.0, a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains 15 fully interactive home-sized scenes with 108 rooms populated with rigid and articulated objects. The scenes are replicas of real-world homes, with distribution and the layout of objects aligned to those of the real world. iGibson 1.0 integrates several key features to facilitate the study of interactive tasks: i) generation of high-quality virtual sensor signals (RGB, depth, segmentation, LiDAR, flow and so on), ii) domain randomization to change the materials of the objects (both visual and physical) and/or their shapes, iii) integrated sampling-based motion planners to generate collision-free trajectories for robot bases and arms, and iv) intuitive human-iGibson interface that enables efficient collection of human demonstrations. Through experiments, we show that the full interactivity of the scenes enables agents to learn useful visual representations that accelerate the training of downstream manipulation tasks. We also show that iGibson 1.0 features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of human demonstrated (mobile) manipulation behaviors. iGibson 1.0 is open-source, equipped with comprehensive examples and documentation. For more information, visit our project website: http://svl.stanford.edu/igibson/

16.4ROOct 18, 2025

Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification

Yilin Wu, Anqi Li, Tucker Hermans et al. · nvidia

Reasoning Vision Language Action (VLA) models improve robotic instruction-following by generating step-by-step textual plans before low-level actions, an approach inspired by Chain-of-Thought (CoT) reasoning in language models. Yet even with a correct textual plan, the generated actions can still miss the intended outcomes in the plan, especially in out-of-distribution (OOD) scenarios. We formalize this phenomenon as a lack of embodied CoT faithfulness, and introduce a training-free, runtime policy steering method for reasoning-action alignment. Given a reasoning VLA's intermediate textual plan, our framework samples multiple candidate action sequences from the same model, predicts their outcomes via simulation, and uses a pre-trained Vision-Language Model (VLM) to select the sequence whose outcome best aligns with the VLA's own textual plan. Only executing action sequences that align with the textual reasoning turns our base VLA's natural action diversity from a source of error into a strength, boosting robustness to semantic and visual OOD perturbations and enabling novel behavior composition without costly re-training. We also contribute a reasoning-annotated extension of LIBERO-100, environment variations tailored for OOD evaluation, and demonstrate up to 15% performance gain over prior work on behavior composition tasks and scales with compute and data diversity. Project Website at: https://yilin-wu98.github.io/steering-reasoning-vla/

16.4ROAug 13, 2021

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Chen Wang, Claudia Pérez-D'Arpino, Danfei Xu et al.

We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator. Across a 2D strategy game, a human-robot handover task, and a multi-step collaborative manipulation task, our method outperforms the alternatives in both simulated evaluations and when executing the tasks with a real human operator in-the-loop. Supplementary materials and videos at https://sites.google.com/view/co-gail-web/home

7.0RONov 22, 2020

Experimental Assessment of Human-Robot Teaming for Multi-Step Remote Manipulation with Expert Operators

Claudia Pérez-D'Arpino, Rebecca P. Khurshid, Julie A. Shah

Remote robot manipulation with human control enables applications where safety and environmental constraints are adverse to humans (e.g. underwater, space robotics and disaster response) or the complexity of the task demands human-level cognition and dexterity (e.g. robotic surgery and manufacturing). These systems typically use direct teleoperation at the motion level, and are usually limited to low-DOF arms and 2D perception. Improving dexterity and situational awareness demands new interaction and planning workflows. We explore the use of human-robot teaming through teleautonomy with assisted planning for remote control of a dual-arm dexterous robot for multi-step manipulation tasks, and conduct a within-subjects experimental assessment (n=12 expert users) to compare it with other methods, resulting in the following four conditions: (A) Direct teleoperation with imitation controller + 2D perception, (B) Condition A + 3D perception, (C) Teleautonomy interface teleoperation + 2D & 3D perception, (D) Condition C + assisted planning. The results indicate that this approach (D) achieves task times comparable with direct teleoperation (A,B) while improving a number of other objective and subjective metrics, including re-grasps, collisions, and TLX workload metrics. When compared to a similar interface but removing the assisted planning (C), D reduces the task time and removes a significant interaction with the level of expertise of the operator, resulting in a performance equalizer across users.

21.2ROOct 16, 2020

Robot Navigation in Constrained Pedestrian Environments using Reinforcement Learning

Claudia Pérez-D'Arpino, Can Liu, Patrick Goebel et al.

Navigating fluently around pedestrians is a necessary capability for mobile robots deployed in human environments, such as buildings and homes. While research on social navigation has focused mainly on the scalability with the number of pedestrians in open spaces, typical indoor environments present the additional challenge of constrained spaces such as corridors and doorways that limit maneuverability and influence patterns of pedestrian interaction. We present an approach based on reinforcement learning (RL) to learn policies capable of dynamic adaptation to the presence of moving pedestrians while navigating between desired locations in constrained environments. The policy network receives guidance from a motion planner that provides waypoints to follow a globally planned trajectory, whereas RL handles the local interactions. We explore a compositional principle for multi-layout training and find that policies trained in a small set of geometrically simple layouts successfully generalize to more complex unseen layouts that exhibit composition of the structural elements available during training. Going beyond walls-world like domains, we show transfer of the learned policy to unseen 3D reconstructions of two real environments. These results support the applicability of the compositional principle to navigation in real-world buildings and indicate promising usage of multi-agent simulation within reconstructed environments for tasks that involve interaction.