Tucker Hermans

RO
h-index34
48papers
1,333citations
Novelty53%
AI Score51

48 Papers

ROApr 11, 2022
Correcting Robot Plans with Natural Language Feedback

Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis et al. · microsoft-research, mit

When humans design cost or goal specifications for robots, they often produce specifications that are ambiguous, underspecified, or beyond planners' ability to solve. In these cases, corrections provide a valuable tool for human-in-the-loop robot control. Corrections might take the form of new goal specifications, new constraints (e.g. to avoid specific objects), or hints for planning algorithms (e.g. to visit specific waypoints). Existing correction methods (e.g. using a joystick or direct manipulation of an end effector) require full teleoperation or real-time interaction. In this paper, we explore natural language as an expressive and flexible tool for robot correction. We describe how to map from natural language sentences to transformations of cost functions. We show that these transformations enable users to correct goals, update robot motions to accommodate additional user preferences, and recover from planning errors. These corrections can be leveraged to get 81% and 93% success rates on tasks where the original planner failed, with either one or two language corrections. Our method makes it possible to compose multiple constraints and generalizes to unseen scenes, objects, and sentences in simulated environments and real-world environments.

RONov 8, 2022
StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

Weiyu Liu, Yilun Du, Tucker Hermans et al. · mit, nvidia

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.

25.2ROJun 3
Continuum Robot State Estimation with Actuation Uncertainty

James M. Ferguson, Alan Kuntz, Tucker Hermans

Continuum robots are flexible, slender manipulators well suited for confined surgical environments. In these settings, unknown interaction forces and model uncertainty significantly affect robot shape, motivating state estimation from external observations. Existing estimation methods either neglect actuation modeling or rely on simplified deterministic actuation models. In contrast, we jointly estimate robot shape, external loads, and actuation inputs using mechanically principled actuation priors. To achieve this, we present a discrete Cosserat rod formulation with piecewise-linear strain integration that provides high numerical accuracy while inducing a sparse factor graph structure for efficient nonlinear optimization. We extend the framework to tendon-driven and parallel robots in simulation and validate it experimentally on a surgical concentric tube robot. Overall, our approach enables principled real-time estimation across multiple robot architectures while providing direct access to manipulator Jacobians through the linearized factor graph.

RODec 16, 2022
Planning Visual-Tactile Precision Grasps via Complementary Use of Vision and Touch

Martin Matak, Tucker Hermans · nvidia

Reliably planning fingertip grasps for multi-fingered hands lies as a key challenge for many tasks including tool use, insertion, and dexterous in-hand manipulation. This task becomes even more difficult when the robot lacks an accurate model of the object to be grasped. Tactile sensing offers a promising approach to account for uncertainties in object shape. However, current robotic hands tend to lack full tactile coverage. As such, a problem arises of how to plan and execute grasps for multi-fingered hands such that contact is made with the area covered by the tactile sensors. To address this issue, we propose an approach to grasp planning that explicitly reasons about where the fingertips should contact the estimated object surface while maximizing the probability of grasp success. Key to our method's success is the use of visual surface estimation for initial planning to encode the contact constraint. The robot then executes this plan using a tactile-feedback controller that enables the robot to adapt to online estimates of the object's surface to correct for errors in the initial plan. Importantly, the robot never explicitly integrates object pose or surface estimates between visual and tactile sensing, instead it uses the two modalities in complementary ways. Vision guides the robots motion prior to contact; touch updates the plan when contact occurs differently than predicted from vision. We show that our method successfully synthesises and executes precision grasps for previously unseen objects using surface estimates from a single camera view. Further, our approach outperforms a state of the art multi-fingered grasp planner, while also beating several baselines we propose.

ROSep 26, 2023
Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models

Yixuan Huang, Jialin Yuan, Chanho Kim et al. · nvidia

Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real-world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers of distractor actions. Furthermore, we show our approaches outperform an implicit memory baseline.

ROMay 6, 2022
DULA and DEBA: Differentiable Ergonomic Risk Models for Postural Assessment and Optimization in Ergonomically Intelligent pHRI

Amir Yazdani, Roya Sabbagh Novin, Andrew Merryweather et al. · nvidia

Ergonomics and human comfort are essential concerns in physical human-robot interaction applications. Defining an accurate and easy-to-use ergonomic assessment model stands as an important step in providing feedback for postural correction to improve operator health and comfort. Common practical methods in the area suffer from inaccurate ergonomics models in performing postural optimization. In order to retain assessment quality, while improving computational considerations, we propose a novel framework for postural assessment and optimization for ergonomically intelligent physical human-robot interaction. We introduce DULA and DEBA, differentiable and continuous ergonomics models learned to replicate the popular and scientifically validated RULA and REBA assessments with more than 99% accuracy. We show that DULA and DEBA provide assessment comparable to RULA and REBA while providing computational benefits when being used in postural optimization. We evaluate our framework through human and simulation experiments. We highlight DULA and DEBA's strength in a demonstration of postural optimization for a simulated pHRI task.

ROSep 25, 2023
DefGoalNet: Contextual Goal Learning from Demonstrations For Deformable Object Manipulation

Bao Thach, Tanner Watts, Shing-Hei Ho et al. · nvidia

Shape servoing, a robotic task dedicated to controlling objects to desired goal shapes, is a promising approach to deformable object manipulation. An issue arises, however, with the reliance on the specification of a goal shape. This goal has been obtained either by a laborious domain knowledge engineering process or by manually manipulating the object into the desired shape and capturing the goal shape at that specific moment, both of which are impractical in various robotic applications. In this paper, we solve this problem by developing a novel neural network DefGoalNet, which learns deformable object goal shapes directly from a small number of human demonstrations. We demonstrate our method's effectiveness on various robotic tasks, both in simulation and on a physical robot. Notably, in the surgical retraction task, even when trained with as few as 10 demonstrations, our method achieves a median success percentage of nearly 90%. These results mark a substantial advancement in enabling shape servoing methods to bring deformable object manipulation closer to practical, real-world applications.

ROAug 12, 2022
Occlusion-Robust Multi-Sensory Posture Estimation in Physical Human-Robot Interaction

Amir Yazdani, Roya Sabbagh Novin, Andrew Merryweather et al. · nvidia

3D posture estimation is important in analyzing and improving ergonomics in physical human-robot interaction and reducing the risk of musculoskeletal disorders. Vision-based posture estimation approaches are prone to sensor and model errors, as well as occlusion, while posture estimation solely from the interacting robot's trajectory suffers from ambiguous solutions. To benefit from the advantages of both approaches and improve upon their drawbacks, we introduce a low-cost, non-intrusive, and occlusion-robust multi-sensory 3D postural estimation algorithm in physical human-robot interaction. We use 2D postures from OpenPose over a single camera, and the trajectory of the interacting robot while the human performs a task. We model the problem as a partially-observable dynamical system and we infer the 3D posture via a particle filter. We present our work in teleoperation, but it can be generalized to other applications of physical human-robot interaction. We show that our multi-sensory system resolves human kinematic redundancy better than posture estimation solely using OpenPose or posture estimation solely using the robot's trajectory. This will increase the accuracy of estimated postures compared to the gold-standard motion capture postures. Moreover, our approach also performs better than other single sensory methods when postural assessment using RULA assessment tool.

ROMar 2, 2022
L4KDE: Learning for KinoDynamic Tree Expansion

Tin Lai, Weiming Zhi, Tucker Hermans et al. · nvidia

We present the Learning for KinoDynamic Tree Expansion (L4KDE) method for kinodynamic planning. Tree-based planning approaches, such as rapidly exploring random tree (RRT), are the dominant approach to finding globally optimal plans in continuous state-space motion planning. Central to these approaches is tree-expansion, the procedure in which new nodes are added into an ever-expanding tree. We study the kinodynamic variants of tree-based planning, where we have known system dynamics and kinematic constraints. In the interest of quickly selecting nodes to connect newly sampled coordinates, existing methods typically cannot optimise to find nodes that have low cost to transition to sampled coordinates. Instead, they use metrics like Euclidean distance between coordinates as a heuristic for selecting candidate nodes to connect to the search tree. We propose L4KDE to address this issue. L4KDE uses a neural network to predict transition costs between queried states, which can be efficiently computed in batch, providing much higher quality estimates of transition cost compared to commonly used heuristics while maintaining almost-surely asymptotic optimality guarantee. We empirically demonstrate the significant performance improvement provided by L4KDE on a variety of challenging system dynamics, with the ability to generalise across different instances of the same model class, and in conjunction with a suite of modern tree-based motion planners.

ROJul 12, 2021Code
DefGraspSim: Simulation-based grasping of 3D deformable objects

Isabella Huang, Yashraj Narang, Clemens Eppner et al.

Robotic grasping of 3D deformable objects (e.g., fruits/vegetables, internal organs, bottles/boxes) is critical for real-world applications such as food processing, robotic surgery, and household automation. However, developing grasp strategies for such objects is uniquely challenging. In this work, we efficiently simulate grasps on a wide range of 3D deformable objects using a GPU-based implementation of the corotational finite element method (FEM). To facilitate future research, we open-source our simulated dataset (34 objects, 1e5 Pa elasticity range, 6800 grasp evaluations, 1.1M grasp measurements), as well as a code repository that allows researchers to run our full FEM-based grasp evaluation pipeline on arbitrary 3D object models of their choice. We also provide a detailed analysis on 6 object primitives. For each primitive, we methodically describe the effects of different grasp strategies, compute a set of performance metrics (e.g., deformation, stress) that fully capture the object response, and identify simple grasp features (e.g., gripper displacement, contact area) measurable by robots prior to pickup and predictive of these performance metrics. Finally, we demonstrate good correspondence between grasps on simulated objects and their real-world counterparts.

ROApr 29, 2024
Point Cloud Models Improve Visual Robustness in Robotic Learners

Skand Peri, Iain Lee, Chanho Kim et al. · nvidia

Visual control policies can encounter significant performance degradation when visual conditions like lighting or camera position differ from those seen during training -- often exhibiting sharp declines in capability even for minor differences. In this work, we examine robustness to a suite of these types of visual changes for RGB-D and point cloud based visual control policies. To perform these experiments on both model-free and model-based reinforcement learners, we introduce a novel Point Cloud World Model (PCWM) and point cloud based control policies. Our experiments show that policies that explicitly encode point clouds are significantly more robust than their RGB-D counterparts. Further, we find our proposed PCWM significantly outperforms prior works in terms of sample efficiency during training. Taken together, these results suggest reasoning about the 3D scene through point clouds can improve performance, reduce learning time, and increase robustness for robotic learners. Project Webpage: https://pvskand.github.io/projects/PCWM

CVNov 29, 2024
Robust Bayesian Scene Reconstruction with Retrieval-Augmented Priors for Precise Grasping and Planning

Herbert Wright, Weiming Zhi, Martin Matak et al. · nvidia

Constructing 3D representations of object geometry is critical for many robotics tasks, particularly manipulation problems. These representations must be built from potentially noisy partial observations. In this work, we focus on the problem of reconstructing a multi-object scene from a single RGBD image using a fixed camera. Traditional scene representation methods generally cannot infer the geometry of unobserved regions of the objects in the image. Attempts have been made to leverage deep learning to train on a dataset of known objects and representations, and then generalize to new observations. However, this can be brittle to noisy real-world observations and objects not contained in the dataset, and do not provide well-calibrated reconstruction confidences. We propose BRRP, a reconstruction method that leverages preexisting mesh datasets to build an informative prior during robust probabilistic reconstruction. We introduce the concept of a retrieval-augmented prior, where we retrieve relevant components of our prior distribution from a database of objects during inference. The resulting prior enables estimation of the geometry of occluded portions of the in-scene objects. Our method produces a distribution over object shape that can be used for reconstruction and measuring uncertainty. We evaluate our method in both simulated scenes and in the real world. We demonstrate the robustness of our method against deep learning-only approaches while being more accurate than a method without an informative prior. Through real-world experiments, we particularly highlight the capability of BRRP to enable successful dexterous manipulation in clutter.

ROMay 31, 2025
Constrained Stein Variational Gradient Descent for Robot Perception, Planning, and Identification

Griffin Tabor, Tucker Hermans · nvidia

Many core problems in robotics can be framed as constrained optimization problems. Often on these problems, the robotic system has uncertainty, or it would be advantageous to identify multiple high quality feasible solutions. To enable this, we present two novel frameworks for applying principles of constrained optimization to the new variational inference algorithm Stein variational gradient descent. Our general framework supports multiple types of constrained optimizers and can handle arbitrary constraints. We demonstrate on a variety of problems that we are able to learn to approximate distributions without violating constraints. Specifically, we show that we can build distributions of: robot motion plans that exactly avoid collisions, robot arm joint angles on the SE(3) manifold with exact table placement constraints, and object poses from point clouds with table placement constraints.

ROOct 23, 2025
The Reality Gap in Robotics: Challenges, Solutions, and Best Practices

Elie Aljalbout, Jiaxu Xing, Angel Romero et al. · mit, nvidia

Machine learning has facilitated significant advancements across various robotics domains, including navigation, locomotion, and manipulation. Many such achievements have been driven by the extensive use of simulation as a critical tool for training and testing robotic systems prior to their deployment in real-world environments. However, simulations consist of abstractions and approximations that inevitably introduce discrepancies between simulated and real environments, known as the reality gap. These discrepancies significantly hinder the successful transfer of systems from simulation to the real world. Closing this gap remains one of the most pressing challenges in robotics. Recent advances in sim-to-real transfer have demonstrated promising results across various platforms, including locomotion, navigation, and manipulation. By leveraging techniques such as domain randomization, real-to-sim transfer, state and action abstractions, and sim-real co-training, many works have overcome the reality gap. However, challenges persist, and a deeper understanding of the reality gap's root causes and solutions is necessary. In this survey, we present a comprehensive overview of the sim-to-real landscape, highlighting the causes, solutions, and evaluation metrics for the reality gap and sim-to-real transfer.

ROOct 18, 2025
Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification

Yilin Wu, Anqi Li, Tucker Hermans et al. · nvidia

Reasoning Vision Language Action (VLA) models improve robotic instruction-following by generating step-by-step textual plans before low-level actions, an approach inspired by Chain-of-Thought (CoT) reasoning in language models. Yet even with a correct textual plan, the generated actions can still miss the intended outcomes in the plan, especially in out-of-distribution (OOD) scenarios. We formalize this phenomenon as a lack of embodied CoT faithfulness, and introduce a training-free, runtime policy steering method for reasoning-action alignment. Given a reasoning VLA's intermediate textual plan, our framework samples multiple candidate action sequences from the same model, predicts their outcomes via simulation, and uses a pre-trained Vision-Language Model (VLM) to select the sequence whose outcome best aligns with the VLA's own textual plan. Only executing action sequences that align with the textual reasoning turns our base VLA's natural action diversity from a source of error into a strength, boosting robustness to semantic and visual OOD perturbations and enabling novel behavior composition without costly re-training. We also contribute a reasoning-annotated extension of LIBERO-100, environment variations tailored for OOD evaluation, and demonstrate up to 15% performance gain over prior work on behavior composition tasks and scales with compute and data diversity. Project Website at: https://yilin-wu98.github.io/steering-reasoning-vla/

ROMay 9, 2024
Composable Part-Based Manipulation

Weiyu Liu, Jiayuan Mao, Joy Hsu et al.

In this paper, we propose composable part-based manipulation (CPM), a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills. By considering the functional correspondences between object parts, we conceptualize functional actions, such as pouring and constrained placing, as combinations of different correspondence constraints. CPM comprises a collection of composable diffusion models, where each model captures a different inter-object correspondence. These diffusion models can generate parameters for manipulation skills based on the specific object parts. Leveraging part-based correspondences coupled with the task decomposition into distinct constraints enables strong generalization to novel objects and object categories. We validate our approach in both simulated and real-world scenarios, demonstrating its effectiveness in achieving robust and generalized manipulation capabilities.

ROMay 8, 2023
DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

Bao Thach, Brian Y. Cho, Shing-Hei Ho et al.

Applications in fields ranging from home care to warehouse fulfillment to surgical assistance require robots to reliably manipulate the shape of 3D deformable objects. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the manipulated object and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn a visual servo controller that computes the desired robot end-effector action to iteratively deform the object toward the target shape. We demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training, including ex vivo chicken muscle tissue. Crucially, using DeformerNet, the robot successfully accomplishes three surgical sub-tasks: retraction (moving tissue aside to access a site underneath it), tissue wrapping (a sub-task in procedures like aortic stent placements), and connecting two tubular pieces of tissue (a sub-task in anastomosis).

ROOct 19, 2021
StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

Weiyu Liu, Chris Paxton, Tucker Hermans et al.

Geometric organization of objects into semantically meaningful arrangements pervades the built world. As such, assistive robots operating in warehouses, offices, and homes would greatly benefit from the ability to recognize and rearrange objects into these semantically meaningful structures. To be useful, these robots must contend with previously unseen objects and receive instructions without significant programming. While previous works have examined recognizing pairwise semantic relations and sequential manipulation to change these simple relations none have shown the ability to arrange objects into complex structures such as circles or table settings. To address this problem we propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement and a structured language command encoding the desired object configuration. We show through rigorous experiments that StructFormer enables a physical robot to rearrange novel objects into semantically meaningful structures with multi-object relational constraints inferred from the language command.

ROOct 15, 2021
Toward Learning Context-Dependent Tasks from Demonstration for Tendon-Driven Surgical Robots

Yixuan Huang, Michael Bentley, Tucker Hermans et al.

Tendon-driven robots, a type of continuum robot, have the potential to reduce the invasiveness of surgery by enabling access to difficult-to-reach anatomical targets. In the future, the automation of surgical tasks for these robots may help reduce surgeon strain in the face of a rapidly growing population. However, directly encoding surgical tasks and their associated context for these robots is infeasible. In this work we take steps toward a system that is able to learn to successfully perform context-dependent surgical tasks by learning directly from a set of expert demonstrations. We present three models trained on the demonstrations conditioned on a vector encoding the context of the demonstration. We then use these models to plan and execute motions for the tendon-driven robot similar to the demonstrations for novel context not seen in the training set. We demonstrate the efficacy of our method on three surgery-inspired tasks.

ROOct 12, 2021
Planning Sensing Sequences for Subsurface 3D Tumor Mapping

Brian Y. Cho, Tucker Hermans, Alan Kuntz

Surgical automation has the potential to enable increased precision and reduce the per-patient workload of overburdened human surgeons. An effective automation system must be able to sense and map subsurface anatomy, such as tumors, efficiently and accurately. In this work, we present a method that plans a sequence of sensing actions to map the 3D geometry of subsurface tumors. We leverage a sequential Bayesian Hilbert map to create a 3D probabilistic occupancy model that represents the likelihood that any given point in the anatomy is occupied by a tumor, conditioned on sensor readings. We iteratively update the map, utilizing Bayesian optimization to determine sensing poses that explore unsensed regions of anatomy and exploit the knowledge gained by previous sensing actions. We demonstrate our method's efficiency and accuracy in three anatomical scenarios including a liver tumor scenario generated from a real patient's CT scan. The results show that our proposed method significantly outperforms comparison methods in terms of efficiency while detecting subsurface tumors with high accuracy.

ROOct 10, 2021
Learning Visual Shape Control of Novel 3D Deformable Objects from Partial-View Point Clouds

Bao Thach, Brian Y. Cho, Alan Kuntz et al.

If robots could reliably manipulate the shape of 3D deformable objects, they could find applications in fields ranging from home care to warehouse fulfillment to surgical assistance. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the object being manipulated and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn to define a visual servo controller that provides Cartesian pose changes to the robot end-effector causing the object to deform towards its target shape. Crucially, we demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training and outperforms comparison methods for both the generic shape control and the surgical task of retraction.

ROAug 26, 2021
Predicting Stable Configurations for Semantic Placement of Novel Objects

Chris Paxton, Chris Xie, Tucker Hermans et al.

Human environments contain numerous objects configured in a variety of arrangements. Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments. We break this problem down into two parts: (1) finding physically valid locations for the objects and (2) determining if those poses satisfy learned, high-level semantic relationships. We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects. We train our models purely in simulation, with no fine-tuning needed for use in the real world. Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing. Our experiments through a set of simulated ablations demonstrate that using a relational classifier alone is not sufficient for reliable planning. We further demonstrate the ability of our planner to generate and execute diverse manipulation plans through a set of real-world experiments with a variety of objects.

ROAug 26, 2021
Parallelised Diffeomorphic Sampling-based Motion Planning

Tin Lai, Weiming Zhi, Tucker Hermans et al.

We propose Parallelised Diffeomorphic Sampling-based Motion Planning (PDMP). PDMP is a novel parallelised framework that uses bijective and differentiable mappings, or diffeomorphisms, to transform sampling distributions of sampling-based motion planners, in a manner akin to normalising flows. Unlike normalising flow models which use invertible neural network structures to represent these diffeomorphisms, we develop them from gradient information of desired costs, and encode desirable behaviour, such as obstacle avoidance. These transformed sampling distributions can then be used for sampling-based motion planning. A particular example is when we wish to imbue the sampling distribution with knowledge of the environment geometry, such that drawn samples are less prone to be in collisions. To this end, we propose to learn a continuous occupancy representation from environment occupancy data, such that gradients of the representation defines a valid diffeomorphism and is amenable to fast parallel evaluation. We use this to "morph" the sampling distribution to draw far fewer collision-prone samples. PDMP is able to leverage gradient information of costs, to inject specifications, in a manner similar to optimisation-based motion planning methods, but relies on drawing from a sampling distribution, retaining the tendency to find more global solutions, thereby bridging the gap between trajectory optimisation and sampling-based planning methods.

ROAug 12, 2021
Ergonomically Intelligent Physical Human-Robot Interaction: Postural Estimation, Assessment, and Optimization

Amir Yazdani, Roya Sabbagh Novin, Andrew Merryweather et al.

Ergonomics and human comfort are essential concerns in physical human-robot interaction. Common practical methods in the area either fail in estimating the correct posture due to occlusion or suffer from inaccurate ergonomics models in performing postural optimization. We propose a novel alternative framework for posture estimation, assessment, and optimization for ergonomically intelligent physical human-robot interaction. We show that we can estimate human posture solely from the trajectory of the interacting robot with median deviation of 5 deg from motion capture. We propose DULA, a differentiable ergonomics assessment tool with 99.73% accuracy comparing to RULA. We use DULA in postural optimization for physical human-robot interaction tasks such as co-manipulation and teleoperation. We evaluate our framework through human and simulation experiments.

ROJul 16, 2021
DeformerNet: A Deep Learning Approach to 3D Deformable Object Manipulation

Bao Thach, Alan Kuntz, Tucker Hermans

In this paper, we propose a novel approach to 3D deformable object manipulation leveraging a deep neural network called DeformerNet. Controlling the shape of a 3D object requires an effective state representation that can capture the full 3D geometry of the object. Current methods work around this problem by defining a set of feature points on the object or only deforming the object in 2D image space, which does not truly address the 3D shape control problem. Instead, we explicitly use 3D point clouds as the state representation and apply Convolutional Neural Network on point clouds to learn the 3D features. These features are then mapped to the robot end-effector's position using a fully-connected neural network. Once trained in an end-to-end fashion, DeformerNet directly maps the current point cloud of a deformable object, as well as a target point cloud shape, to the desired displacement in robot gripper position. In addition, we investigate the problem of predicting the manipulation point location given the initial and goal shape of the object.

ROJul 14, 2021
DULA: A Differentiable Ergonomics Model for Postural Optimization in Physical HRI

Amir Yazdani, Roya Sabbagh Novin, Andrew Merryweather et al.

Ergonomics and human comfort are essential concerns in physical human-robot interaction applications. Defining an accurate and easy-to-use ergonomic assessment model stands as an important step in providing feedback for postural correction to improve operator health and comfort. In order to enable efficient computation, previously proposed automated ergonomic assessment and correction tools make approximations or simplifications to gold-standard assessment tools used by ergonomists in practice. In order to retain assessment quality, while improving computational considerations, we introduce DULA, a differentiable and continuous ergonomics model learned to replicate the popular and scientifically validated RULA assessment. We show that DULA provides assessment comparable to RULA while providing computational benefits. We highlight DULA's strength in a demonstration of gradient-based postural optimization for a simulated teleoperation task.

AIJan 8, 2021
Optimizing Hospital Room Layout to Reduce the Risk of Patient Falls

Sarvenaz Chaeibakhsh, Roya Sabbagh Novin, Tucker Hermans et al.

Despite years of research into patient falls in hospital rooms, falls and related injuries remain a serious concern to patient safety. In this work, we formulate a gradient-free constrained optimization problem to generate and reconfigure the hospital room interior layout to minimize the risk of falls. We define a cost function built on a hospital room fall model that takes into account the supportive or hazardous effect of the patient's surrounding objects, as well as simulated patient trajectories inside the room. We define a constraint set that ensures the functionality of the generated room layouts in addition to conforming to architectural guidelines. We solve this problem efficiently using a variant of simulated annealing. We present results for two real-world hospital room types and demonstrate a significant improvement of 18% on average in patient fall risk when compared with a traditional hospital room layout and 41% when compared with randomly generated layouts.

RONov 11, 2020
Comparing Piezoresistive Substrates for Tactile Sensing in Dexterous Hands

Rebecca Miles, Martin Matak, Sarah Hood et al.

While tactile skins have been shown to be useful for detecting collisions between a robotic arm and its environment, they have not been extensively used for improving robotic grasping and in-hand manipulation. We propose a novel sensor design for use in covering existing multi-fingered robot hands. We analyze the performance of four different piezoresistive materials using both fabric and anti-static foam substrates in benchtop experiments. We find that although the piezoresistive foam was designed as packing material and not for use as a sensing substrate, it performs comparably with fabrics specifically designed for this purpose. While these results demonstrate the potential of piezoresistive foams for tactile sensing applications, they do not fully characterize the efficacy of these sensors for use in robot manipulation. As such, we use a low density foam substrate to develop a scalable tactile skin that can be attached to the palm of a robotic hand. We demonstrate several robotic manipulation tasks using this sensor to show its ability to reliably detect and localize contact, as well as analyze contact patterns during grasping and transport tasks. Our project website provides details on all materials, software, and data used in the sensor development and analysis: https://sites.google.com/gcloud.utah.edu/piezoresistive-tactile-sensing/.

RONov 9, 2020
Planning under Uncertainty to Goal Distributions

Adam Conkey, Tucker Hermans

Goals for planning problems are typically conceived of as subsets of the state space. However, for many practical planning problems in robotics, we expect the robot to predict goals, e.g. from noisy sensors or by generalizing learned models to novel contexts. In these cases, sets with uncertainty naturally extend to probability distributions. While a few works have used probability distributions as goals for planning, surprisingly no systematic treatment of planning to goal distributions exists in the literature. This article serves to fill that gap. We argue that goal distributions are a more appropriate goal representation than deterministic sets for many robotics applications. We present a novel approach to planning under uncertainty to goal distributions, which we use to highlight several advantages of the goal distribution formulation. We build on previous results in the literature by formally framing our approach as an instance of planning as inference. We additionally derive reductions of several common planning objectives as special cases of our probabilistic planning framework. Our experiments demonstrate the flexibility of probability distributions as a goal representation on a variety of problems including planar navigation among obstacles, intercepting a moving target, rolling a ball to a target location, and a 7-DOF robot arm reaching to grasp an object.

ROOct 16, 2020
Risk-Aware Decision Making in Service Robots to Minimize Risk of Patient Falls in Hospitals

Roya Sabbagh Novin, Amir Yazdani, Andrew Merryweather et al.

Planning under uncertainty is a crucial capability for autonomous systems to operate reliably in uncertain and dynamic environments. The concern of safety becomes even more critical in healthcare settings where robots interact with human patients. In this paper, we propose a novel risk-aware planning framework to minimize the risk of falls by providing a patient with an assistive device. Our approach combines learning-based prediction with model-based control to plan for the fall prevention task. This provides advantages compared to end-to-end learning methods in which the robot's performance is limited to specific scenarios, or purely model-based approaches that use relatively simple function approximators and are prone to high modeling errors. We compare various risk metrics and the results from simulated scenarios show that using the proposed cost function, the robot can plan interventions to avoid high fall score events.

ROJun 6, 2020
Multi-Fingered Active Grasp Learning

Qingkai Lu, Mark Van der Merwe, Tucker Hermans

Learning-based approaches to grasp planning are preferred over analytical methods due to their ability to better generalize to new, partially observed objects. However, data collection remains one of the biggest bottlenecks for grasp learning methods, particularly for multi-fingered hands. The relatively high dimensional configuration space of the hands coupled with the diversity of objects common in daily life requires a significant number of samples to produce robust and confident grasp success classifiers. In this paper, we present the first active deep learning approach to grasping that searches over the grasp configuration space and classifier confidence in a unified manner. We base our approach on recent success in planning multi-fingered grasps as probabilistic inference with a learned neural network likelihood function. We embed this within a multi-armed bandit formulation of sample selection. We show that our active grasp learning approach uses fewer training samples to produce grasp success rates comparable with the passive supervised learning method trained with grasping data generated by an analytical planner. We additionally show that grasps generated by the active learner have greater qualitative and quantitative diversity in shape.

ROMar 30, 2020
In-Hand Object-Dynamics Inference using Tactile Fingertips

Balakumar Sundaralingam, Tucker Hermans

Having the ability to estimate an object's properties through interaction will enable robots to manipulate novel objects. Object's dynamics, specifically the friction and inertial parameters have only been estimated in a lab environment with precise and often external sensing. Could we infer an object's dynamics in the wild with only the robot's sensors? In this paper, we explore the estimation of dynamics of a grasped object in motion, with tactile force sensing at multiple fingertips. Our estimation approach does not rely on torque sensing to estimate the dynamics. To estimate friction, we develop a control scheme to actively interact with the object until slip is detected. To robustly perform the inertial estimation, we setup a factor graph that fuses all our sensor measurements on physically consistent manifolds and perform inference. We show that tactile fingertips enable in-hand dynamics estimation of low mass objects.

ROFeb 24, 2020
Is The Leader Robot an Adequate Sensor for Posture Estimation and Ergonomic Assessment of A Human Teleoperator?

Amir Yazdani, Roya Sabbagh Novin, Andrew Merryweather et al.

Ergonomic assessment of human posture plays a vital role in understanding work-related safety and health. Current posture estimation approaches face occlusion challenges in teleoperation and physical human-robot interaction. We investigate if the leader robot is an adequate sensor for posture estimation in teleoperation and we introduce a new probabilistic approach that relies solely on the trajectory of the leader robot for generating observations. We model the human using a redundant, partially-observable dynamical system and we infer the posture using a standard particle filter. We compare our approach with postures from a commercial motion capture system and also two least-squares optimization approaches for human inverse kinematics. The results reveal that the proposed approach successfully estimates human postures and ergonomic risk scores comparable to those estimates from gold-standard motion capture.

ROJan 25, 2020
Multi-Fingered Grasp Planning via Inference in Deep Neural Networks

Qingkai Lu, Mark Van der Merwe, Balakumar Sundaralingam et al.

We propose a novel approach to multi-fingered grasp planning leveraging learned deep neural network models. We train a voxel-based 3D convolutional neural network to predict grasp success probability as a function of both visual information of an object and grasp configuration. We can then formulate grasp planning as inferring the grasp configuration which maximizes the probability of grasp success. In addition, we learn a prior over grasp configurations as a mixture density network conditioned on our voxel-based object representation. We show that this object conditional prior improves grasp inference when used with the learned grasp success prediction network when compared to a learned, object-agnostic prior, or an uninformed uniform prior. Our work is the first to directly plan high quality multi-fingered grasps in configuration space using a deep neural network without the need of an external planner. We validate our inference method performing multi-finger grasping on a physical robot. Our experimental results show that our planning method outperforms existing grasp planning methods for neural networks.

ROJan 9, 2020
Benchmarking In-Hand Manipulation

Silvia Cruciani, Balakumar Sundaralingam, Kaiyu Hang et al.

The purpose of this benchmark is to evaluate the planning and control aspects of robotic in-hand manipulation systems. The goal is to assess the system's ability to change the pose of a hand-held object by either using the fingers, environment or a combination of both. Given an object surface mesh from the YCB data-set, we provide examples of initial and goal states (i.e.\ static object poses and fingertip locations) for various in-hand manipulation tasks. We further propose metrics that measure the error in reaching the goal state from a specific initial state, which, when aggregated across all tasks, also serves as a measure of the system's in-hand manipulation capability. We provide supporting software, task examples, and evaluation results associated with the benchmark. All the supporting material is available at https://robot-learning.cs.utah.edu/project/benchmarking_in_hand_manipulation

RODec 19, 2019
A Model Predictive Approach for Online Mobile Manipulation of Nonholonomic Objects using Learned Dynamics

Roya Sabbagh Novin, Amir Yazdani, Andrew Merryweather et al.

A particular type of assistive robots designed for physical interaction with objects could play an important role assisting with mobility and fall prevention in healthcare facilities. Autonomous mobile manipulation presents a hurdle prior to safely using robots in real life applications. In this article, we introduce a mobile manipulation framework based on model predictive control using learned dynamics models of objects. We focus on the specific problem of manipulating legged objects such as those commonly found in healthcare environments and personal dwellings (e.g. walkers, tables, chairs). We describe a probabilistic method for autonomous learning of an approximate dynamics model for these objects. In this method, we learn dynamic parameters using a small dataset consisting of force and motion data from interactions between the robot and object. Moreover, we account for multiple manipulation strategies by formulating the manipulation planning as a mixed-integer convex optimization. The proposed framework considers the hybrid control system comprised of i) choosing which leg to grasp, and ii) control of continuous applied forces for manipulation. We formalize our algorithm based on model predictive control to compensate for modeling errors and find an optimal path to manipulate the object from one configuration to another. We show results for several objects with various wheel configurations. Simulation and physical experiments show that the obtained dynamics models are sufficiently accurate for safe and collision-free manipulation. When combined with the proposed manipulation planning algorithm, the robot successfully moves the object to a desired pose while avoiding collision.

RONov 21, 2019
Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies

Visak Kumar, Tucker Hermans, Dieter Fox et al.

Using simulation to train robot manipulation policies holds the promise of an almost unlimited amount of training data, generated safely out of harm's way. One of the key challenges of using simulation, to date, has been to bridge the reality gap, so that policies trained in simulation can be deployed in the real world. We explore the reality gap in the context of learning a contextual policy for multi-fingered robotic grasping. We propose a Grasping Objects Approach for Tactile (GOAT) robotic hands, learning to overcome the reality gap problem. In our approach we use human hand motion demonstration to initialize and reduce the search space for learning. We contextualize our policy with the bounding cuboid dimensions of the object of interest, which allows the policy to work on a more flexible representation than directly using an image or point cloud. Leveraging fingertip touch sensors in the hand allows the policy to overcome the reduction in geometric information introduced by the coarse bounding box, as well as pose estimation uncertainty. We show our learned policy successfully runs on a real robot without any fine tuning, thus bridging the reality gap.

ROOct 2, 2019
Learning Continuous 3D Reconstructions for Geometrically Aware Grasping

Mark Van der Merwe, Qingkai Lu, Balakumar Sundaralingam et al.

Deep learning has enabled remarkable improvements in grasp synthesis for previously unseen objects from partial object views. However, existing approaches lack the ability to explicitly reason about the full 3D geometry of the object when selecting a grasp, relying on indirect geometric reasoning derived when learning grasp success networks. This abandons explicit geometric reasoning, such as avoiding undesired robot object collisions. We propose to utilize a novel, learned 3D reconstruction to enable geometric awareness in a grasping system. We leverage the structure of the reconstruction network to learn a grasp success classifier which serves as the objective function for a continuous grasp optimization. We additionally explicitly constrain the optimization to avoid undesired contact, directly using the reconstruction. We examine the role of geometry in grasping both in the training of grasp metrics and through 96 robot grasping trials. Our results can be found on https://sites.google.com/view/reconstruction-grasp/.

ROSep 17, 2019
Learning to Manipulate Object Collections Using Grounded State Representations

Matthew Wilson, Tucker Hermans

We propose a method for sim-to-real robot learning which exploits simulator state information in a way that scales to many objects. We first train a pair of encoder networks to capture multi-object state information in a latent space. One of these encoders is a CNN, which enables our system to operate on RGB images in the real world; the other is a graph neural network (GNN) state encoder, which directly consumes a set of raw object poses and enables more accurate reward calculation and value estimation. Once trained, we use these encoders in a reinforcement learning algorithm to train image-based policies that can manipulate many objects. We evaluate our method on the task of pushing a collection of objects to desired tabletop regions. Compared to methods which rely only on images or use fixed-length state encodings, our method achieves higher success rates, performs well in the real world without fine tuning, and generalizes to different numbers and types of objects not seen during training.

ROJul 8, 2019
Assembly Planning by Subassembly Decomposition Using Blocking Reduction

James Watson, Tucker Hermans

The sequence in which a complex product is assembled directly impacts the ease and efficiency of the assembly process, whether executed by a human or a robot. A sequence that gives the assembler the greatest freedom of movement is therefore desirable. Our main contribution is an expression of obstruction relationships between parts as a disassembly interference graph (DIG). We validate this heuristic by developing a disassembly sequence planner that partitions assemblies in a way that prioritizes access to parts, resulting in plans that are comparable in efficiency to two state-of-the-art assembly methods in terms of total plan length. Using DIG, our method generates successive subassembly decompositions, yielding a tree structure that makes parallization opportunities apparent. Our planner generates viable disassembly plans by minimizing our part blockage measure, and thereby demonstrates that this measure is a valuable addition to the Assembly Sequence Planning toolkit.

ROJun 29, 2019
Active Learning of Probabilistic Movement Primitives

Adam Conkey, Tucker Hermans

A Probabilistic Movement Primitive (ProMP) defines a distribution over trajectories with an associated feedback policy. ProMPs are typically initialized from human demonstrations and achieve task generalization through probabilistic operations. However, there is currently no principled guidance in the literature to determine how many demonstrations a teacher should provide and what constitutes a "good" demonstration for promoting generalization. In this paper, we present an active learning approach to learning a library of ProMPs capable of task generalization over a given space. We utilize uncertainty sampling techniques to generate a task instance for which a teacher should provide a demonstration. The provided demonstration is incorporated into an existing ProMP if possible, or a new ProMP is created from the demonstration if it is determined that it is too dissimilar from existing demonstrations. We provide a qualitative comparison between common active learning metrics; motivated by this comparison we present a novel uncertainty sampling approach named Greatest Mahalanobis Distance. We perform grasping experiments on a real KUKA robot and show our novel active learning measure achieves better task generalization with fewer demonstrations than a random sampling over the space.

ROMay 10, 2019
Building 3D Object Models during Manipulation by Reconstruction-Aware Trajectory Optimization

Kanrun Huang, Tucker Hermans

Object shape provides important information for robotic manipulation; for instance, selecting an effective grasp depends on both the global and local shape of the object of interest, while reaching into clutter requires accurate surface geometry to avoid unintended contact with the environment. Model-based 3D object manipulation is a widely studied problem; however, obtaining the accurate 3D object models for multiple objects often requires tedious work. In this letter, we exploit Gaussian process implicit surfaces (GPIS) extracted from RGB-D sensor data to grasp an unknown object. We propose a reconstruction-aware trajectory optimization that makes use of the extracted GPIS model plan a motion to improve the ability to estimate the object's 3D geometry, while performing a pick-and-place action. We present a probabilistic approach for a robot to autonomously learn and track the object, while achieve the manipulation task. We use a sampling-based trajectory generation method to explore the unseen parts of the object using the estimated conditional entropy of the GPIS model. We validate our method with physical robot experiments across eleven different objects of varying shape from the YCB object dataset. Our experiments show that our reconstruction-aware trajectory optimization provides higher-quality 3D object reconstruction when compared with directly solving the manipulation task or using a heuristic to view unseen portions of the object.

ROJan 10, 2019
Modeling Grasp Type Improves Learning-Based Grasp Planning

Qingkai Lu, Tucker Hermans

Different manipulation tasks require different types of grasps. For example, holding a heavy tool like a hammer requires a multi-fingered power grasp offering stability, while holding a pen to write requires a multi-fingered precision grasp to impart dexterity on the object. In this paper, we propose a probabilistic grasp planner that explicitly models grasp type for planning high-quality precision and power grasps in real-time. We take a learning approach in order to plan grasps of different types for previously unseen objects when only partial visual information is available. Our work demonstrates the first supervised learning approach to grasp planning that can explicitly plan both power and precision grasps for a given object. Additionally, we compare our learned grasp model with a model that does not encode type and show that modeling grasp type improves the success rate of generated grasps. Furthermore we show the benefit of learning a prior over grasp configurations to improve grasp inference with a learned classifier.

RONov 7, 2018
Learning Task Constraints from Demonstration for Hybrid Force/Position Control

Adam Conkey, Tucker Hermans

We present a novel method for learning hybrid force/position control from demonstration. We learn a dynamic constraint frame aligned to the direction of desired force using Cartesian Dynamic Movement Primitives. In contrast to approaches that utilize a fixed constraint frame, our approach easily accommodates tasks with rapidly changing task constraints over time. We activate only one degree of freedom for force control at any given time, ensuring motion is always possible orthogonal to the direction of desired force. Since we utilize demonstrated forces to learn the constraint frame, we are able to compensate for forces not detected by methods that learn only from demonstrated kinematic motion, such as frictional forces between the end-effector and contact surface. We additionally propose novel extensions to the Dynamic Movement Primitive framework that encourage robust transition from free-space motion to in-contact motion in spite of environment uncertainty. We incorporate force feedback and a dynamically shifting goal to reduce forces applied to the environment and retain stable contact while enabling force control. Our methods exhibit low impact forces on contact and low steady-state tracking error.

ROOct 15, 2018
Robust Learning of Tactile Force Estimation through Robot Interaction

Balakumar Sundaralingam, Alexander Lambert, Ankur Handa et al.

Current methods for estimating force from tactile sensor signals are either inaccurate analytic models or task-specific learned models. In this paper, we explore learning a robust model that maps tactile sensor signals to force. We specifically explore learning a mapping for the SynTouch BioTac sensor via neural networks. We propose a voxelized input feature layer for spatial signals and leverage information about the sensor surface to regularize the loss function. To learn a robust tactile force model that transfers across tasks, we generate ground truth data from three different sources: (1) the BioTac rigidly mounted to a force torque~(FT) sensor, (2) a robot interacting with a ball rigidly attached to the same FT sensor, and (3) through force inference on a planar pushing task by formalizing the mechanics as a system of particles and optimizing over the object motion. A total of 140k samples were collected from the three sources. We achieve a median angular accuracy of 3.5 degrees in predicting force direction (66% improvement over the current state of the art) and a median magnitude accuracy of 0.06 N (93% improvement) on a test dataset. Additionally, we evaluate the learned force model in a force feedback grasp controller performing object lifting and gentle placement. Our results can be found on https://sites.google.com/view/tactile-force.

ROJun 4, 2018
Relaxed-Rigidity Constraints: Kinematic Trajectory Optimization and Collision Avoidance for In-Grasp Manipulation

Balakumar Sundaralingam, Tucker Hermans

This paper proposes a novel approach to performing in-grasp manipulation: the problem of moving an object with reference to the palm from an initial pose to a goal pose without breaking or making contacts. Our method to perform in-grasp manipulation uses kinematic trajectory optimization which requires no knowledge of dynamic properties of the object. We implement our approach on an Allegro robot hand and perform thorough experiments on 10 objects from the YCB dataset. However, the proposed method is general enough to generate motions for most objects the robot can grasp. Experimental result support the feasibillty of its application across a variety of object shapes. We explore the adaptability of our approach to additional task requirements by including collision avoidance and joint space smoothness costs. The grasped object avoids collisions with the environment by the use of a signed distance cost function. We reduce the effects of unmodeled object dynamics by requiring smooth joint trajectories. We additionally compensate for errors encountered during trajectory execution by formulating an object pose feedback controller.

ROApr 12, 2018
Geometric In-Hand Regrasp Planning: Alternating Optimization of Finger Gaits and In-Grasp Manipulation

Balakumar Sundaralingam, Tucker Hermans

This paper explores the problem of autonomous, in-hand regrasping--the problem of moving from an initial grasp on an object to a desired grasp using the dexterity of a robot's fingers. We propose a planner for this problem which alternates between finger gaiting, and in-grasp manipulation. Finger gaiting enables the robot to move a single finger to a new contact location on the object, while the remaining fingers stably hold the object. In-grasp manipulation moves the object to a new pose relative to the robot's palm, while maintaining the contact locations between the hand and object. Given the object's geometry (as a mesh), the hand's kinematic structure, and the initial and desired grasps, we plan a sequence of finger gaits and object reposing actions to reach the desired grasp without dropping the object. We propose an optimization based approach and report in-hand regrasping plans for 5 objects over 5 in-hand regrasp goals each. The plans generated by our planner are collision free and guarantee kinematic feasibility.

ROApr 10, 2018
Planning Multi-Fingered Grasps as Probabilistic Inference in a Learned Deep Network

Qingkai Lu, Kautilya Chenna, Balakumar Sundaralingam et al.

We propose a novel approach to multi-fingered grasp planning leveraging learned deep neural network models. We train a convolutional neural network to predict grasp success as a function of both visual information of an object and grasp configuration. We can then formulate grasp planning as inferring the grasp configuration which maximizes the probability of grasp success. We efficiently perform this inference using a gradient-ascent optimization inside the neural network using the backpropagation algorithm. Our work is the first to directly plan high quality multifingered grasps in configuration space using a deep neural network without the need of an external planner. We validate our inference method performing both multifinger and two-finger grasps on real robots. Our experimental results show that our planning method outperforms existing planning methods for neural networks; while offering several other benefits including being data-efficient in learning and fast enough to be deployed in real robotic applications.