RONov 17, 2022
SE(3)-Equivariant Relational Rearrangement with Neural Descriptor FieldsAnthony Simeonov, Yilun Du, Lin Yen-Chen et al. · mit
We present a method for performing tasks involving spatial relations between novel object instances initialized in arbitrary poses directly from point cloud observations. Our framework provides a scalable way for specifying new tasks using only 5-10 demonstrations. Object rearrangement is formalized as the question of finding actions that configure task-relevant parts of the object in a desired alignment. This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment. We overcome the key technical challenge of determining task-relevant local coordinate frames from a few demonstrations by developing an optimization method based on Neural Descriptor Fields (NDFs) and a single annotated 3D keypoint. An energy-based learning scheme to model the joint configuration of the objects that satisfies a desired relational task further improves performance. The method is tested on three multi-object rearrangement tasks in simulation and on a real robot. Project website, videos, and code: https://anthonysimeonov.github.io/r-ndf/
RODec 9, 2022
Visuotactile Affordances for Cloth Manipulation with Local ControlNeha Sunil, Shaoxiong Wang, Yu She et al. · stanford
Cloth in the real world is often crumpled, self-occluded, or folded in on itself such that key regions, such as corners, are not directly graspable, making manipulation difficult. We propose a system that leverages visual and tactile perception to unfold the cloth via grasping and sliding on edges. By doing so, the robot is able to grasp two adjacent corners, enabling subsequent manipulation tasks like folding or hanging. As components of this system, we develop tactile perception networks that classify whether an edge is grasped and estimate the pose of the edge. We use the edge classification network to supervise a visuotactile edge grasp affordance network that can grasp edges with a 90% success rate. Once an edge is grasped, we demonstrate that the robot can slide along the cloth to the adjacent corner using tactile pose estimation/control in real time. See http://nehasunil.com/visuotactile/visuotactile.html for videos.
ROJul 10, 2023
Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal RearrangementAnthony Simeonov, Ankit Goyal, Lucas Manuelli et al. · nvidia
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/
ROMar 3, 2022
NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance FieldsLin Yen-Chen, Pete Florence, Jonathan T. Barron et al.
Thin, reflective objects such as forks and whisks are common in our daily lives, but they are particularly challenging for robot perception because it is hard to reconstruct them using commodity RGB-D cameras or multi-view stereo techniques. While traditional pipelines struggle with objects like these, Neural Radiance Fields (NeRFs) have recently been shown to be remarkably effective for performing view synthesis on objects with thin structures or reflective materials. In this paper we explore the use of NeRF as a new source of supervision for robust robot vision systems. In particular, we demonstrate that a NeRF representation of a scene can be used to train dense object descriptors. We use an optimized NeRF to extract dense correspondences between multiple views of an object, and then use these correspondences as training data for learning a view-invariant representation of the object. NeRF's usage of a density field allows us to reformulate the correspondence problem with a novel distribution-of-depths formulation, as opposed to the conventional approach of using a depth map. Dense correspondence models supervised with our method significantly outperform off-the-shelf learned descriptors by 106% (PCK@3px metric, more than doubling performance) and outperform our baseline supervised with multi-view stereo by 29%. Furthermore, we demonstrate the learned dense descriptors enable robots to perform accurate 6-degree of freedom (6-DoF) pick and place of thin and reflective objects.
CVApr 25, 2022
Tac2Pose: Tactile Object Pose Estimation from the First TouchMaria Bauza, Antonia Bronars, Alberto Rodriguez
In this paper, we present Tac2Pose, an object-specific approach to tactile pose estimation from the first touch for known objects. Given the object geometry, we learn a tailored perception model in simulation that estimates a probability distribution over possible object poses given a tactile observation. To do so, we simulate the contact shapes that a dense set of object poses would produce on the sensor. Then, given a new contact shape obtained from the sensor, we match it against the pre-computed set using an object-specific embedding learned using contrastive learning. We obtain contact shapes from the sensor with an object-agnostic calibration step that maps RGB tactile observations to binary contact shapes. This mapping, which can be reused across object and sensor instances, is the only step trained with real sensor data. This results in a perception model that localizes objects from the first real tactile observation. Importantly, it produces pose distributions and can incorporate additional pose constraints coming from other perception systems, contacts, or priors. We provide quantitative results for 20 objects. Tac2Pose provides high accuracy pose estimations from distinctive tactile observations while regressing meaningful pose distributions to account for those contact shapes that could result from different object poses. We also test Tac2Pose on object models reconstructed from a 3D scanner, to evaluate the robustness to uncertainty in the object model. Finally, we demonstrate the advantages of Tac2Pose compared with three baseline methods for tactile pose estimation: directly regressing the object pose with a neural network, matching an observed contact to a set of possible contacts using a standard classification neural network, and direct pixel comparison of an observed contact with a set of possible contacts. Website: http://mcube.mit.edu/research/tac2pose.html
ROJul 24, 2023
simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objectsMaria Bauza, Antonia Bronars, Yifan Hou et al.
Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware grasping, visuotactile perception, and regrasp planning. Task-aware grasping computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects. Videos are available at http://mcube.mit.edu/research/simPLE.html .
RONov 9, 2021Code
A Differentiable Recipe for Learning Visual Non-Prehensile Planar ManipulationBernardo Aceituno, Alberto Rodriguez, Shubham Tulsiani et al.
Specifying tasks with videos is a powerful technique towards acquiring novel and general robot skills. However, reasoning over mechanics and dexterous interactions can make it challenging to scale learning contact-rich manipulation. In this work, we focus on the problem of visual non-prehensile planar manipulation: given a video of an object in planar motion, find contact-aware robot actions that reproduce the same object motion. We propose a novel architecture, Differentiable Learning for Manipulation (\ours), that combines video decoding neural models with priors from contact mechanics by leveraging differentiable optimization and finite difference based simulation. Through extensive simulated experiments, we investigate the interplay between traditional model-based techniques and modern deep learning approaches. We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions. \url{https://github.com/baceituno/dlm}.
ROMar 23, 2021Code
GelSlim3.0: High-Resolution Measurement of Shape, Force and Slip in a Compact Tactile-Sensing FingerIan Taylor, Siyuan Dong, Alberto Rodriguez
This work presents a new version of the tactile-sensing finger GelSlim 3.0, which integrates the ability to sense high-resolution shape, force, and slip in a compact form factor for use with small parallel jaw grippers in cluttered bin-picking scenarios. The novel design incorporates the capability to use real-time analytic methods to measure shape, estimate the contact 3D force distribution, and detect incipient slip. To achieve a compact integration, we optimize the optical path from illumination source to camera and other geometric variables in a optical simulation environment. In particular, we optimize the illumination sources and a light shaping lens around the constraints imposed by the photometric stereo algorithm used for depth reconstruction. The optimized optical configuration is integrated into a finger design composed of robust and easily replaceable snap-to-fit fingetip module that allow for ease of manufacture, assembly, use, and repair. To stimulate future research in tactile-sensing and provide the robotics community access to reliable and easily-reproducible tactile finger with a diversity of sensing modalities, we open-source the design and software at https://github.com/mcubelab/gelslim.
ROSep 4, 2025
Reactive In-Air Clothing Manipulation with Confidence-Aware Dense Correspondence and Visuotactile AffordanceNeha Sunil, Megha Tippur, Arnau Saumell et al.
Manipulating clothing is challenging due to complex configurations, variable material dynamics, and frequent self-occlusion. Prior systems often flatten garments or assume visibility of key features. We present a dual-arm visuotactile framework that combines confidence-aware dense visual correspondence and tactile-supervised grasp affordance to operate directly on crumpled and suspended garments. The correspondence model is trained on a custom, high-fidelity simulated dataset using a distributional loss that captures cloth symmetries and generates correspondence confidence estimates. These estimates guide a reactive state machine that adapts folding strategies based on perceptual uncertainty. In parallel, a visuotactile grasp affordance network, self-supervised using high-resolution tactile feedback, determines which regions are physically graspable. The same tactile classifier is used during execution for real-time grasp validation. By deferring action in low-confidence states, the system handles highly occluded table-top and in-air configurations. We demonstrate our task-agnostic grasp selection module in folding and hanging tasks. Moreover, our dense descriptors provide a reusable intermediate representation for other planning modalities, such as extracting grasp targets from human video demonstrations, paving the way for more generalizable and scalable garment manipulation.
RODec 9, 2021
Neural Descriptor Fields: SE(3)-Equivariant Object Representations for ManipulationAnthony Simeonov, Yilun Du, Andrea Tagliasacchi et al.
We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target (such as a robot gripper or a rack used for hanging) via category-level descriptors. We employ this representation for object manipulation, where given a task demonstration, we want to repeat the same task on a new object instance from the same category. We propose to achieve this objective by searching (via optimization) for the pose whose descriptor matches that observed in the demonstration. NDFs are conveniently trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints. Further, NDFs are SE(3)-equivariant, guaranteeing performance that generalizes across all possible 3D object translations and rotations. We demonstrate learning of manipulation tasks from few (5-10) demonstrations both in simulation and on a real robot. Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors. Project website: https://yilundu.github.io/ndf/.
ROOct 7, 2021
Active Extrinsic Contact Sensing: Application to General Peg-in-Hole InsertionSangwoon Kim, Alberto Rodriguez
We propose a method that actively estimates contact location between a grasped rigid object and its environment and uses this as input to a peg-in-hole insertion policy. An estimation model and an active tactile feedback controller work collaboratively to estimate the external contacts accurately. The controller helps the estimation model get a better estimate by regulating a consistent contact mode. The better estimation makes it easier for the controller to regulate the contact. We then train an object-agnostic insertion policy that learns to use the series of contact estimates to guide the insertion of an unseen peg into a hole. In contrast with previous works that learn a policy directly from tactile signals, since this policy is in contact configuration space, it can be learned directly in simulation. Lastly, we demonstrate and evaluate the active extrinsic contact line estimation and the trained insertion policy together in a real experiment. We show that the proposed method inserts various-shaped test objects with higher success rates and fewer insertion attempts than previous work with end-to-end approaches. See supplementary video and results at https://sites.google.com/view/active-extrinsic-contact.
ROApr 2, 2021
Tactile-RL for Insertion: Generalization to Objects of Unknown GeometrySiyuan Dong, Devesh K. Jha, Diego Romeres et al.
Object insertion is a classic contact-rich manipulation task. The task remains challenging, especially when considering general objects of unknown geometry, which significantly limits the ability to understand the contact configuration between the object and the environment. We study the problem of aligning the object and environment with a tactile-based feedback insertion policy. The insertion process is modeled as an episodic policy that iterates between insertion attempts followed by pose corrections. We explore different mechanisms to learn such a policy based on Reinforcement Learning. The key contribution of this paper is to demonstrate that it is possible to learn a tactile insertion policy that generalizes across different object geometries, and an ablation study of the key design choices for the learning agent: 1) the type of learning scheme: supervised vs. reinforcement learning; 2) the type of learning schedule: unguided vs. curriculum learning; 3) the type of sensing modality: force/torque (F/T) vs. tactile; and 4) the type of tactile representation: tactile RGB vs. tactile flow. We show that the optimal configuration of the learning agent (RL + curriculum + tactile flow) exposed to 4 training objects yields an insertion policy that inserts 4 novel objects with over 85.0% success rate and within 3~4 attempts. Comparisons between F/T and tactile sensing, shows that while an F/T-based policy learns more efficiently, a tactile-based policy provides better generalization.
ROMar 15, 2021
Extrinsic Contact Sensing with Relative-Motion Tracking from Distributed Tactile MeasurementsDaolin Ma, Siyuan Dong, Alberto Rodriguez
This paper addresses the localization of contacts of an unknown grasped rigid object with its environment, i.e., extrinsic to the robot. We explore the key role that distributed tactile sensing plays in localizing contacts external to the robot, in contrast to the role that aggregated force/torque measurements play in localizing contacts on the robot. When in contact with the environment, an object will move in accordance with the kinematic and possibly frictional constraints imposed by that contact. Small motions of the object, which are observable with tactile sensors, indirectly encode those constraints and the geometry that defines them. We formulate the extrinsic contact sensing problem as a constraint-based estimation. The estimation is subject to the kinematic constraints imposed by the tactile measurements of object motion, as well as the kinematic (e.g., non-penetration) and possibly frictional (e.g., sticking) constraints imposed by rigid-body mechanics. We validate the approach in simulation and with real experiments on the case studies of fixed point and line contacts. This paper discusses the theoretical basis for the value of distributed tactile sensing in contrast to aggregated force/torque measurements. It also provides an estimation framework for localizing environmental contacts with potential impact in contact-rich manipulation scenarios such as assembling or packing.
ROJan 7, 2021
Planning for Multi-stage Forceful ManipulationRachel Holladay, Tomás Lozano-Pérez, Alberto Rodriguez
Multi-stage forceful manipulation tasks, such as twisting a nut on a bolt, require reasoning over interlocking constraints over discrete as well as continuous choices. The robot must choose a sequence of discrete actions, or strategy, such as whether to pick up an object, and the continuous parameters of each of those actions, such as how to grasp the object. In forceful manipulation tasks, the force requirements substantially impact the choices of both strategy and parameters. To enable planning and executing forceful manipulation, we augment an existing task and motion planner with controllers that exert wrenches and constraints that explicitly consider torque and frictional limits. In two domains, opening a childproof bottle and twisting a nut, we demonstrate how the system considers a combinatorial number of strategies and how choosing actions that are robust to parameter variations impacts the choice of strategy.
RODec 31, 2020
Robotic Grasping of Fully-Occluded Objects using RF PerceptionTara Boroushaki, Junshan Leng, Ian Clester et al.
We present the design, implementation, and evaluation of RF-Grasp, a robotic system that can grasp fully-occluded objects in unknown and unstructured environments. Unlike prior systems that are constrained by the line-of-sight perception of vision and infrared sensors, RF-Grasp employs RF (Radio Frequency) perception to identify and locate target objects through occlusions, and perform efficient exploration and complex manipulation tasks in non-line-of-sight settings. RF-Grasp relies on an eye-in-hand camera and batteryless RFID tags attached to objects of interest. It introduces two main innovations: (1) an RF-visual servoing controller that uses the RFID's location to selectively explore the environment and plan an efficient trajectory toward an occluded target, and (2) an RF-visual deep reinforcement learning network that can learn and execute efficient, complex policies for decluttering and grasping. We implemented and evaluated an end-to-end physical prototype of RF-Grasp. We demonstrate it improves success rate and efficiency by up to 40-50% over a state-of-the-art baseline. We also demonstrate RF-Grasp in novel tasks such mechanical search of fully-occluded objects behind obstacles, opening up new possibilities for robotic manipulation. Qualitative results (videos) available at rfgrasp.media.mit.edu
CVDec 10, 2020
INeRF: Inverting Neural Radiance Fields for Pose EstimationLin Yen-Chen, Pete Florence, Jonathan T. Barron et al.
We present iNeRF, a framework that performs mesh-free pose estimation by "inverting" a Neural RadianceField (NeRF). NeRFs have been shown to be remarkably effective for the task of view synthesis - synthesizing photorealistic novel views of real-world scenes or objects. In this work, we investigate whether we can apply analysis-by-synthesis via NeRF for mesh-free, RGB-only 6DoF pose estimation - given an image, find the translation and rotation of a camera relative to a 3D object or scene. Our method assumes that no object mesh models are available during either training or test time. Starting from an initial pose estimate, we use gradient descent to minimize the residual between pixels rendered from a NeRF and pixels in an observed image. In our experiments, we first study 1) how to sample rays during pose refinement for iNeRF to collect informative gradients and 2) how different batch sizes of rays affect iNeRF on a synthetic dataset. We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF. Finally, we show iNeRF can perform category-level object pose estimation, including object instances not seen during training, with RGB images by inverting a NeRF model inferred from a single view.
RODec 9, 2020
Tactile Object Pose Estimation from the First Touch with Geometric Contact RenderingMaria Bauza, Eric Valls, Bryan Lim et al.
In this paper, we present an approach to tactile pose estimation from the first touch for known objects. First, we create an object-agnostic map from real tactile observations to contact shapes. Next, for a new object with known geometry, we learn a tailored perception model completely in simulation. To do so, we simulate the contact shapes that a dense set of object poses would produce on the sensor. Then, given a new contact shape obtained from the sensor output, we match it against the pre-computed set using the object-specific embedding learned purely in simulation using contrastive learning. This results in a perception model that can localize objects from a single tactile observation. It also allows reasoning over pose distributions and including additional pose constraints coming from other perception systems or multiple contacts. We provide quantitative results for four objects. Our approach provides high accuracy pose estimations from distinctive tactile observations while regressing pose distributions to account for those contact shapes that could result from different object poses. We further extend and test our approach in multi-contact scenarios where several tactile sensors are simultaneously in contact with the object. Website: http://mcube.mit.edu/research/tactile_loc_first_touch.html
RONov 16, 2020
A Long Horizon Planning Framework for Manipulating Rigid Pointcloud ObjectsAnthony Simeonov, Yilun Du, Beomjoon Kim et al.
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects that operates directly from a point-cloud observation, i.e. without prior object models. Our method plans in the space of object subgoals and frees the planner from reasoning about robot-object interaction dynamics by relying on a set of generalizable manipulation primitives. We show that for rigid bodies, this abstraction can be realized using low-level manipulation skills that maintain sticking contact with the object and represent subgoals as 3D transformations. To enable generalization to unseen objects and improve planning performance, we propose a novel way of representing subgoals for rigid-body manipulation and a graph-attention based neural network architecture for processing point-cloud inputs. We experimentally validate these choices using simulated and real-world experiments on the YuMi robot. Results demonstrate that our method can successfully manipulate new objects into target configurations requiring long-term planning. Overall, our framework realizes the best of the worlds of task-and-motion planning (TAMP) and learning-based approaches. Project website: https://anthonysimeonov.github.io/rpo-planning-framework/.
RONov 13, 2020
Tactile SLAM: Real-time inference of shape and pose from planar pushingSudharshan Suresh, Maria Bauza, Kuan-Ting Yu et al.
Tactile perception is central to robot manipulation in unstructured environments. However, it requires contact, and a mature implementation must infer object models while also accounting for the motion induced by the interaction. In this work, we present a method to estimate both object shape and pose in real-time from a stream of tactile measurements. This is applied towards tactile exploration of an unknown object by planar pushing. We consider this as an online SLAM problem with a nonparametric shape representation. Our formulation of tactile inference alternates between Gaussian process implicit surface regression and pose estimation on a factor graph. Through a combination of local Gaussian processes and fixed-lag smoothing, we infer object shape and pose in real-time. We evaluate our system across different objects in both simulated and real-world planar pushing tasks.
ROSep 8, 2020
Long-Horizon Prediction and Uncertainty Propagation with Residual Point Contact LearnersNima Fazeli, Anurag Ajay, Alberto Rodriguez
The ability to simulate and predict the outcome of contacts is paramount to the successful execution of many robotic tasks. Simulators are powerful tools for the design of robots and their behaviors, yet the discrepancy between their predictions and observed data limit their usability. In this paper, we propose a self-supervised approach to learning residual models for rigid-body simulators that exploits corrections of contact models to refine predictive performance and propagate uncertainty. We empirically evaluate the framework by predicting the outcomes of planar dice rolls and compare it's performance to state-of-the-art techniques.
ROFeb 8, 2020
Tactile Dexterity: Manipulation Primitives with Tactile FeedbackFrancois R. Hogan, Jose Ballester, Siyuan Dong et al.
This paper develops closed-loop tactile controllers for dexterous robotic manipulation with a dual-palm robotic system. Tactile dexterity is an approach to dexterous manipulation that plans for robot/object interactions that render interpretable tactile information for control. We divide the role of tactile control into two goals: 1) control the contact state between the end-effector and the object (contact/no-contact, stick/slip) by regulating the stability of planned contact configurations and monitoring undesired slip events; and 2) control the object state by tactile-based tracking and iterative replanning of the object and robot trajectories. Key to this formulation is the decomposition of manipulation plans into sequences of manipulation primitives with simple mechanics and efficient planners. We consider the scenario of manipulating an object from an initial pose to a target pose on a flat surface while correcting for external perturbations and uncertainty in the initial pose of the object. We experimentally validate the approach with an ABB YuMi dual-arm robot and demonstrate the ability of the tactile controller to react to external perturbations.
RONov 8, 2019
Accurate Vision-based Manipulation through Contact ReasoningAlina Kloss, Maria Bauza, Jiajun Wu et al.
Planning contact interactions is one of the core challenges of many robotic tasks. Optimizing contact locations while taking dynamics into account is computationally costly and, in environments that are only partially observable, executing contact-based tasks often suffers from low accuracy. We present an approach that addresses these two challenges for the problem of vision-based manipulation. First, we propose to disentangle contact from motion optimization. Thereby, we improve planning efficiency by focusing computation on promising contact locations. Second, we use a hybrid approach for perception and state estimation that combines neural networks with a physically meaningful state representation. In simulation and real-world experiments on the task of planar pushing, we show that our method is more efficient and achieves a higher manipulation accuracy than previous vision-based approaches.
RONov 1, 2019
Hybrid Differential Dynamic Programming for Planar Manipulation PrimitivesNeel Doshi, Francois R. Hogan, Alberto Rodriguez
We present a hybrid differential dynamic programming (DDP) algorithm for closed-loop execution of manipulation primitives with frictional contact switches. Planning and control of these primitives is challenging as they are hybrid, under-actuated, and stochastic. We address this by developing hybrid DDP both to plan finite horizon trajectories with a few contact switches and to create linear stabilizing controllers. We evaluate the performance and computational cost of our framework in ablations studies for two primitives: planar pushing and planar pivoting. We find that generating pose-to-pose closed-loop trajectories from most configurations requires only a couple (one to two) hybrid switches and can be done in reasonable time (one to five seconds). We further demonstrate that our controller stabilizes these hybrid trajectories on a real pushing system. A video describing our work can be found at https://youtu.be/YGSe4cUfq6Q.
ROOct 3, 2019
Cable Manipulation with a Tactile-Reactive GripperYu She, Shaoxiong Wang, Siyuan Dong et al.
Cables are complex, high dimensional, and dynamic objects. Standard approaches to manipulate them often rely on conservative strategies that involve long series of very slow and incremental deformations, or various mechanical fixtures such as clamps, pins or rings. We are interested in manipulating freely moving cables, in real time, with a pair of robotic grippers, and with no added mechanical constraints. The main contribution of this paper is a perception and control framework that moves in that direction, and uses real-time tactile feedback to accomplish the task of following a dangling cable. The approach relies on a vision-based tactile sensor, GelSight, that estimates the pose of the cable in the grip, and the friction forces during cable sliding. We achieve the behavior by combining two tactile-based controllers: 1) Cable grip controller, where a PD controller combined with a leaky integrator regulates the gripping force to maintain the frictional sliding forces close to a suitable value; and 2) Cable pose controller, where an LQR controller based on a learned linear model of the cable sliding dynamics keeps the cable centered and aligned on the fingertips to prevent the cable from falling from the grip. This behavior is possible by a reactive gripper fitted with GelSight-based high-resolution tactile sensors. The robot can follow one meter of cable in random configurations within 2-3 hand regrasps, adapting to cables of different materials and thicknesses. We demonstrate a robot grasping a headphone cable, sliding the fingers to the jack connector, and inserting it. To the best of our knowledge, this is the first implementation of real-time cable following without the aid of mechanical fixtures.
ROOct 1, 2019
Omnipush: accurate, diverse, real-world dataset of pushing dynamics with RGB-D videoMaria Bauza, Ferran Alet, Yen-Chen Lin et al.
Pushing is a fundamental robotic skill. Existing work has shown how to exploit models of pushing to achieve a variety of tasks, including grasping under uncertainty, in-hand manipulation and clearing clutter. Such models, however, are approximate, which limits their applicability. Learning-based methods can reason directly from raw sensory data with accuracy, and have the potential to generalize to a wider diversity of scenarios. However, developing and testing such methods requires rich-enough datasets. In this paper we introduce Omnipush, a dataset with high variety of planar pushing behavior. In particular, we provide 250 pushes for each of 250 objects, all recorded with RGB-D and a high precision tracking system. The objects are constructed so as to systematically explore key factors that affect pushing -- the shape of the object and its mass distribution -- which have not been broadly explored in previous datasets, and allow to study generalization in model learning. Omnipush includes a benchmark for meta-learning dynamic models, which requires algorithms that make good predictions and estimate their own uncertainty. We also provide an RGB video prediction benchmark and propose other relevant tasks that can be suited with this dataset. Data and code are available at \url{https://web.mit.edu/mcube/omnipush-dataset/}.
ROSep 12, 2019
Tactile-Based Insertion for Dense Box-PackingSiyuan Dong, Alberto Rodriguez
We study the problem of using high-resolution tactile sensors to control the insertion of objects in a box-packing scenario. We propose a new system based on a tactile sensor GelSlim for the dense packing task. In this paper, we propose an insertion strategy that leverages tactile sensing to: 1) safely probe the box with the grasped object while monitoring incipient slip to maintain a stable grasp on the object. 2) estimate and correct for residual position uncertainties to insert the object into a designated gap without disturbing the environment. Our proposed methodology is based on two neural networks that estimate the error direction and error magnitude, from a stream of tactile imprints, acquired by two GelSlim fingers, during the insertion process. The system is trained on four objects with basic geometric shapes, which we show generalizes to four other common objects. Based on the estimated positional errors, a heuristic controller iteratively adjusts the position of the object and eventually inserts it successfully without requiring prior knowledge of the geometry of the object. The key insight is that dense tactile feedback contains useful information with respect to the contact interaction between the grasped object and its environment. We achieve high success rate and show that unknown objects can be inserted with an average of 6 attempts of the probe-correct loop. The method's ability to generalize to novel objects makes it a good fit for box packing in warehouse automation.
ROSep 9, 2019
Certified GraspingBernardo Aceituno-Cabezas, José Ballester, Alberto Rodriguez
This paper studies robustness in planar grasping from a geometric perspective. By treating grasping as a process that shapes the free-space of an object over time, we can define three types of certificates to guarantee success of a grasp: (a) invariance under an initial set, (b) convergence towards a goal grasp, and (c) observability over the final object pose. We develop convex-combinatorial models for each of these certificates, which can be expressed as simple semi-algebraic relations under mild-modeling assumptions. By leveraging these models to synthesize certificates, we optimize certifiable grasps of arbitrary planar objects composed as a union of convex polygons, using manipulators described as point-fingers. We validate this approach with simulations and real robot experiments, by grasping random polygons, comparing against other standard grasp planning algorithms, and performing sensorless grasps over different objects.
ROApr 24, 2019
Tactile Mapping and Localization from High-Resolution Tactile ImprintsMaria Bauza, Oleguer Canal, Alberto Rodriguez
This work studies the problem of shape reconstruction and object localization using a vision-based tactile sensor, GelSlim. The main contributions are the recovery of local shapes from contact, an approach to reconstruct the tactile shape of objects from tactile imprints, and an accurate method for object localization of previously reconstructed objects. The algorithms can be applied to a large variety of 3D objects and provide accurate tactile feedback for in-hand manipulation. Results show that by exploiting the dense tactile information we can reconstruct the shape of objects with high accuracy and do on-line object identification and localization, opening the door to reactive manipulation guided by tactile sensing. We provide videos and supplemental information in the project's website http://web.mit.edu/mcube/research/tactile_localization.html.
LGApr 18, 2019
Graph Element Networks: adaptive, structured computation and memoryFerran Alet, Adarsh K. Jeewajee, Maria Bauza et al.
We explore the use of graph neural networks (GNNs) to model spatial processes in which there is no a priori graphical structure. Similar to finite element analysis, we assign nodes of a GNN to spatial locations and use a computational process defined on the graph to model the relationship between an initial function defined over a space and a resulting function in the same space. We use GNNs as a computational substrate, and show that the locations of the nodes in space as well as their connectivity can be optimized to focus on the most complex parts of the space. Moreover, this representational strategy allows the learned input-output relationship to generalize over the size of the underlying space and run the same model at different levels of precision, trading computation for accuracy. We demonstrate this method on a traditional PDE problem, a physical prediction problem from robotics, and learning to predict scene images from novel viewpoints.
ROApr 13, 2019
Combining Physical Simulators and Object-Based Networks for ControlAnurag Ajay, Maria Bauza, Jiajun Wu et al.
Physics engines play an important role in robot planning and control; however, many real-world control problems involve complex contact dynamics that cannot be characterized analytically. Most physics engines therefore employ . approximations that lead to a loss in precision. In this paper, we propose a hybrid dynamics model, simulator-augmented interaction networks (SAIN), combining a physics engine with an object-based neural network for dynamics modeling. Compared with existing models that are purely analytical or purely data-driven, our hybrid model captures the dynamics of interacting objects in a more accurate and data-efficient manner.Experiments both in simulation and on a real robot suggest that it also leads to better performance when used in complex control tasks. Finally, we show that our model generalizes to novel environments with varying object shapes and materials.
ROMar 27, 2019
TossingBot: Learning to Throw Arbitrary Objects with Residual PhysicsAndy Zeng, Shuran Song, Johnny Lee et al.
We investigate whether a robot arm can learn to pick and throw arbitrary objects into selected boxes quickly and accurately. Throwing has the potential to increase the physical reachability and picking speed of a robot arm. However, precisely throwing arbitrary objects in unstructured settings presents many challenges: from acquiring reliable pre-throw conditions (e.g. initial pose of object in manipulator) to handling varying object-centric properties (e.g. mass distribution, friction, shape) and dynamics (e.g. aerodynamics). In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error. Within this formulation, we investigate the synergies between grasping and throwing (i.e., learning grasps that enable more accurate throws) and between simulation and deep learning (i.e., using deep networks to predict residuals on top of control parameters predicted by a physics simulator). The resulting system, TossingBot, is able to grasp and throw arbitrary objects into boxes located outside its maximum reach range at 500+ mean picks per hour (600+ grasps per hour with 85% throwing accuracy); and generalizes to new objects and target locations. Videos are available at https://tossingbot.cs.princeton.edu
LGDec 19, 2018
Modular meta-learning in abstract graph networks for combinatorial generalizationFerran Alet, Maria Bauza, Alberto Rodriguez et al.
Modular meta-learning is a new framework that generalizes to unseen datasets by combining a small set of neural modules in different ways. In this work we propose abstract graph networks: using graphs as abstractions of a system's subparts without a fixed assignment of nodes to system subparts, for which we would need supervision. We combine this idea with modular meta-learning to get a flexible framework with combinatorial generalization to new tasks built in. We then use it to model the pushing of arbitrarily shaped objects from little or no training data.
ROOct 31, 2018
Maintaining Grasps within Slipping Bound by Monitoring Incipient SlipSiyuan Dong, Daolin Ma, Elliott Donlon et al.
In this paper, we propose an approach to detect incipient slip, i.e. predict slip, by using a high-resolution vision-based tactile sensor, GelSlim. The sensor dynamically captures the tactile imprints of the contact object and their changes with a soft gel pad. The method assumes the object is mostly rigid and treats the motion of object's imprint on sensor surface as a 2D rigid-body motion. We use the deviation of the true motion field from that of a 2D planar rigid transformation as a measure of slip. The output is a dense slip field which we use to detect when small areas of the contact patch start to slip (incipient slip). The method can detect both translational and rotational incipient slip without any prior knowledge of the object at 24 Hz. We test the method on 10 objects 240 times and achieve 86.25% detection accuracy. We further show how the slip feedback can be used to monitor the gripping force to avoid slip with a closed-loop bottle-cap screwing and unscrewing experiment with incipient slip detection feedback. The method was demonstrated to be useful for the robot to apply proper gripping force and stop screwing at the right point before breaking objects. The method can be applied to many manipulation tasks in both structured and unstructured environments.
ROOct 10, 2018
Dense Tactile Force Distribution Estimation using GelSlim and inverse FEMDaolin Ma, Elliott Donlon, Siyuan Dong et al.
In this paper, we present a new version of tactile sensor GelSlim 2.0 with the capability to estimate the contact force distribution in real time. The sensor is vision-based and uses an array of markers to track deformations on a gel pad due to contact. A new hardware design makes the sensor more rugged, parametrically adjustable and improves illumination. Leveraging the sensor's increased functionality, we propose to use inverse Finite Element Method (iFEM), a numerical method to reconstruct the contact force distribution based on marker displacements. The sensor is able to provide force distribution of contact with high spatial density. Experiments and comparison with ground truth show that the reconstructed force distribution is physically reasonable with good accuracy.
ROSep 29, 2018
In-Hand Manipulation via Motion ConesNikhil Chavan-Dafle, Rachel Holladay, Alberto Rodriguez
In this paper, we present the mechanics and algorithms to compute the set of feasible motions of an object pushed in a plane. This set is known as the motion cone and was previously described for non-prehensile manipulation tasks in the horizontal plane. We generalize its geometric construction to a broader set of planar tasks, where external forces such as gravity influence the dynamics of pushing, and prehensile tasks, where there are complex interactions between the gripper, object, and pusher. We show that the motion cone is defined by a set of low-curvature surfaces and provide a polyhedral cone approximation to it. We verify its validity with 2000 pushing experiments recorded with motion tracking system. Motion cones abstract the algebra involved in simulating frictional pushing by providing bounds on the set of feasible motions and by characterizing which pushes will stick or slip. We demonstrate their use for the dynamic propagation step in a sampling-based planning algorithm for in-hand manipulation. The planner generates trajectories that involve sequences of continuous pushes with 5-1000x speed improvements to equivalent algorithms. Video Summary -- https://youtu.be/tVDO8QMuYhc
ROSep 23, 2018
Regrasping by Fixtureless FixturingNikhil Chavan-Dafle, Alberto Rodriguez
This paper presents a fixturing strategy for regrasping that does not require a physical fixture. To regrasp an object in a gripper, a robot pushes the object against external contact/s in the environment such that the external contact keeps the object stationary while the fingers slide over the object. We call this manipulation technique fixtureless fixturing. Exploiting the mechanics of pushing, we characterize a convex polyhedral set of pushes that results in fixtureless fixturing. These pushes are robust against uncertainty in the object inertia, grasping force, and the friction at the contacts. We propose a sampling-based planner that uses the sets of robust pushes to rapidly build a tree of reachable grasps. A path in this tree is a pushing strategy, possibly involving pushes from different sides, to regrasp the object. We demonstrate the experimental validity and robustness of the proposed manipulation technique with different regrasp examples on a manipulation platform. Such a fast and flexible regrasp planner facilitates versatile and flexible automation solutions.
ROSep 22, 2018
Pneumatic Shape-shifting Fingers to Reorient and GraspNikhil Chavan-Dafle, Kyubin Lee, Alberto Rodriguez
We present pneumatic shape-shifting fingers to enable a simple parallel-jaw gripper for different manipulation modalities. By changing the finger geometry, the gripper effectively changes the contact type between the fingers and an object to facilitate distinct manipulation primitives. In this paper, we demonstrate the development and application of shape-shifting fingers to reorient and grasp cylindrical objects. The shape of the fingers changes based on the air pressure inside them and attains two distinct geometric forms at high and low pressure values. In our implementation, the finger shape switches between a wedge-shaped geometry and V-shaped geometry at high and low pressure, respectively. Using the wedge-shaped geometry, the fingers provide a point contact on a cylindrical object to pivot it to a vertical pose under the effect of gravity. By changing to V-shaped geometry, the fingers localize the object in the vertical pose and securely hold it. Experimental results show that the smooth transition between the two contact types allows a robot with a simple gripper to reorient a cylindrical object lying horizontally on a ground and to grasp it in a vertical pose.
ROSep 17, 2018
A Convex-Combinatorial Model for Planar CagingBernardo Aceituno-Cabezas, Hongkai Dai, Alberto Rodriguez
Caging is a promising tool which allows a robot to manipulate an object without directly reasoning about the contact dynamics involved. Furthermore, caging also provides useful guarantees in terms of robustness to uncertainty, and often serves as a way-point to a grasp. Unfortunately, previous work on caging is often based on computational geometry or discrete topology tools, causing restriction on gripper geometry, and difficulty on integration into larger manipulation frameworks. In this paper, we develop a convex-combinatorial model to characterize caging from an optimization perspective. More specifically, we study the configuration space of the object, where the fingers act as obstacles that enclose the configuration of the object. The convex-combinatorial nature of this approach provides guarantees on optimality, convergence and scalability, and its optimization nature makes it adaptable for further applications on robot manipulation tasks.
ROAug 9, 2018
Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and BouncingAnurag Ajay, Jiajun Wu, Nima Fazeli et al.
An efficient, generalizable physical simulator with universal uncertainty estimates has wide applications in robot state estimation, planning, and control. In this paper, we build such a simulator for two scenarios, planar pushing and ball bouncing, by augmenting an analytical rigid-body simulator with a neural network that learns to model uncertainty as residuals. Combining symbolic, deterministic simulators with learnable, stochastic neural nets provides us with expressiveness, efficiency, and generalizability simultaneously. Our model outperforms both purely analytical and purely learned simulators consistently on real, standard benchmarks. Compared with methods that model uncertainty using Gaussian processes, our model runs much faster, generalizes better to new object shapes, and is able to characterize the complex distribution of object trajectories.
ROJul 26, 2018
A Data-Efficient Approach to Precise and Controlled PushingMaria Bauza, Francois R. Hogan, Alberto Rodriguez
Decades of research in control theory have shown that simple controllers, when provided with timely feedback, can control complex systems. Pushing is an example of a complex mechanical system that is difficult to model accurately due to unknown system parameters such as coefficients of friction and pressure distributions. In this paper, we explore the data-complexity required for controlling, rather than modeling, such a system. Results show that a model-based control approach, where the dynamical model is learned from data, is capable of performing complex pushing trajectories with a minimal amount of training data (10 data points). The dynamics of pushing interactions are modeled using a Gaussian process (GP) and are leveraged within a model predictive control approach that linearizes the GP and imposes actuator and task constraints for a planar manipulation task.
ROMar 27, 2018
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement LearningAndy Zeng, Shuran Song, Stefan Welker et al.
Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these synergies from scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks that map from visual observations to actions: one infers the utility of pushes for a dense pixel-wise sampling of end effector orientations and locations, while the other does the same for grasping. Both networks are trained jointly in a Q-learning framework and are entirely self-supervised by trial and error, where rewards are provided from successful grasps. In this way, our policy learns pushing motions that enable future grasps, while learning grasps that can leverage past pushes. During picking experiments in both simulation and real-world scenarios, we find that our system quickly learns complex behaviors amid challenging cases of clutter, and achieves better grasping success rates and picking efficiencies than baseline alternatives after only a few hours of training. We further demonstrate that our method is capable of generalizing to novel objects. Qualitative results (videos), code, pre-trained models, and simulation environments are available at http://vpg.cs.princeton.edu
ROMar 21, 2018
Realtime State Estimation with Tactile and Visual Sensing for Inserting a Suction-held ObjectKuan-Ting Yu, Alberto Rodriguez
We develop a real-time state estimation system to recover the pose and contact formation of an object relative to its environment. In this paper, we focus on the application of inserting an object picked by a suction cup into a tight space, an enabling technology for robotic packaging. We propose a framework that fuses force and visual sensing for improved accuracy and robustness. Visual sensing is versatile and non-intrusive, but suffers from occlusions and limited accuracy, especially for tasks involving contact. Tactile sensing is local, but provides accuracy and robustness to occlusions. The proposed algorithm to fuse them is based on iSAM, an on-line optimization technique, which we use to incorporate kinematic measurements from the robot, contact geometry of the object and the container, and visual tracking. In this paper, we generalize previous results in planar settings to a 3D task with more complex contact interactions. A key challenge in using force sensing is that we do not observe contact point locations directly. We propose a data-driven method to infer the contact formation, which is then used in real-time by the state estimator. We demonstrate and evaluate the algorithm in a setup instrumented to provide groundtruth.
ROMar 5, 2018
Tactile Regrasp: Grasp Adjustments via Simulated Tactile TransformationsFrancois R. Hogan, Maria Bauza, Oleguer Canal et al.
This paper presents a novel regrasp control policy that makes use of tactile sensing to plan local grasp adjustments. Our approach determines regrasp actions by virtually searching for local transformations of tactile measurements that improve the quality of the grasp. First, we construct a tactile-based grasp quality metric using a deep convolutional neural network trained on over 2800 grasps. The quality of each grasp, a continuous value between 0 and 1, is determined experimentally by measuring its resistance to external perturbations. Second, we simulate the tactile imprints associated with robot motions relative to the initial grasp by performing rigid-body transformations of the given tactile measurements. The newly generated tactile imprints are evaluated with the learned grasp quality network and the regrasp action is chosen to maximize the grasp quality. Results show that the grasp quality network can predict the outcome of grasps with an average accuracy of 85% on known objects and 75% on a cross validation set of 12 objects. The regrasp control policy improves the success rate of grasp actions by an average relative increase of 70% on a test set of 8 objects.
ROMar 1, 2018
GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing FingerElliott Donlon, Siyuan Dong, Melody Liu et al.
This work describes the development of a high-resolution tactile-sensing finger for robot grasping. This finger, inspired by previous GelSight sensing techniques, features an integration that is slimmer, more robust, and with more homogeneous output than previous vision-based tactile sensors. To achieve a compact integration, we redesign the optical path from illumination source to camera by combining light guides and an arrangement of mirror reflections. We parameterize the optical path with geometric design variables and describe the tradeoffs between the finger thickness, the depth of field of the camera, and the size of the tactile sensing area. The sensor sustains the wear from continuous use -- and abuse -- in grasping tasks by combining tougher materials for the compliant soft gel, a textured fabric skin, a structurally rigid body, and a calibration process that maintains homogeneous illumination and contrast of the tactile images during use. Finally, we evaluate the sensor's durability along four metrics that track the signal quality during more than 3000 grasping experiments.
ROFeb 27, 2018
Friction Variability in Planar Pushing Data: Anisotropic Friction and Data-collection BiasDaolin Ma, Alberto Rodriguez
Friction plays a key role in manipulating objects. Most of what we do with our hands, and most of what robots do with their grippers, is based on the ability to control frictional forces. This paper aims to better understand the variability and predictability of planar friction. In particular, we focus on the analysis of a recent dataset on planar pushing by Yu et al. [1] devised to create a data-driven footprint of planar friction. We show in this paper how we can explain a significant fraction of the observed unconventional phenomena, e.g., stochasticity and multi-modality, by combining the effects of material non-homogeneity, anisotropy of friction and biases due to data collection dynamics, hinting that the variability is explainable but inevitable in practice. We introduce an anisotropic friction model and conduct simulation experiments comparing with more standard isotropic friction models. The anisotropic friction between object and supporting surface results in convergence of initial condition during the automated data collection. Numerical results confirm that the anisotropic friction model explains the bias in the dataset and the apparent stochasticity in the outcome of a push. The fact that the data collection process itself can originate biases in the collected datasets, resulting in deterioration of trained models, calls attention to the data collection dynamics.
ROOct 30, 2017
Stable Prehensile Pushing: In-Hand Manipulation with Alternating Sticking ContactsNikhil Chavan-Dafle, Alberto Rodriguez
This paper presents an approach to in-hand manipulation planning that exploits the mechanics of alternating sticking contact. Particularly, we consider the problem of manipulating a grasped object using external pushes for which the pusher sticks to the object. Given the physical properties of the object, frictional coefficients at contacts and a desired regrasp on the object, we propose a sampling-based planning framework that builds a pushing strategy concatenating different feasible stable pushes to achieve the desired regrasp. An efficient dynamics formulation allows us to plan in-hand manipulations 100-1000 times faster than our previous work which builds upon a complementarity formulation. Experimental observations for the generated plans show that the object precisely moves in the grasp as expected by the planner. Video Summary -- youtu.be/qOTKRJMx6Ho
ROOct 16, 2017
Learning Data-Efficient Rigid-Body Contact Models: Case Study of Planar ImpactNima Fazeli, Samuel Zapolsky, Evan Drumwright et al.
In this paper we demonstrate the limitations of common rigid-body contact models used in the robotics community by comparing them to a collection of data-driven and data-reinforced models that exploit underlying structure inspired by the rigid contact paradigm. We evaluate and compare the analytical and data-driven contact models on an empirical planar impact data-set, and show that the learned models are able to outperform their analytical counterparts with a small training set.
ROOct 16, 2017
Reactive Planar Manipulation with Convex Hybrid MPCFrancois Robert Hogan, Eudald Romo Grau, Alberto Rodriguez
This paper presents a reactive controller for planar manipulation tasks that leverages machine learning to achieve real-time performance. The approach is based on a Model Predictive Control (MPC) formulation, where the goal is to find an optimal sequence of robot motions to achieve a desired object motion. Due to the multiple contact modes associated with frictional interactions, the resulting optimization program suffers from combinatorial complexity when tasked with determining the optimal sequence of modes. To overcome this difficulty, we formulate the search for the optimal mode sequences offline, separately from the search for optimal control inputs online. Using tools from machine learning, this leads to a convex hybrid MPC program that can be solved in real-time. We validate our algorithm on a planar manipulation experimental setup where results show that the convex hybrid MPC formulation with learned modes achieves good closed-loop performance on a trajectory tracking problem.
ROOct 13, 2017
Fundamental Limitations in Performance and Interpretability of Common Planar Rigid-Body Contact ModelsNima Fazeli, Samuel Zapolsky, Evan Drumwright et al.
The ability to reason about and predict the outcome of contacts is paramount to the successful execution of many robot tasks. Analytical rigid-body contact models are used extensively in planning and control due to their computational efficiency and simplicity, yet despite their prevalence, little if any empirical comparison of these models has been made and it is unclear how well they approximate contact outcomes. In this paper, we first formulate a system identification approach for six commonly used contact models in the literature, and use the proposed method to find parameters for an experimental data-set of impacts. Next, we compare the models empirically, and establish a task specific upper bound on the performance of the models and the rigid-body contact model paradigm. We highlight the limitations of these models, salient failure modes, and the care that should be taken in parameter selection, which are ultimately difficult to give a physical interpretation.
ROOct 3, 2017
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image MatchingAndy Zeng, Shuran Song, Kuan-Ting Yu et al.
This paper presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses a category-agnostic affordance prediction algorithm to select and execute among four different grasping primitive behaviors. It then recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional training data. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT-Princeton Team system that took 1st place in the stowing task at the 2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are available online at http://arc.cs.princeton.edu