CVJul 21, 2023
RIC: Rotate-Inpaint-Complete for Generalizable Scene ReconstructionIsaac Kasahara, Shubham Agrawal, Selim Engin et al. · apple-ml, cmu
General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In many practical applications such as AR/VR, autonomous navigation, and robotics, only a single view of the scene may be available, making the scene reconstruction task challenging. In this paper, we present a method for scene reconstruction by structurally breaking the problem into two steps: rendering novel views via inpainting and 2D to 3D scene lifting. Specifically, we leverage the generalization capability of large visual language models (Dalle-2) to inpaint the missing areas of scene color images rendered from different views. Next, we lift these inpainted images to 3D by predicting normals of the inpainted image and solving for the missing depth values. By predicting for normals instead of depth directly, our method allows for robustness to changes in depth distributions and scale. With rigorous quantitative evaluation, we show that our method outperforms multiple baselines while providing generalization to novel objects and scenes.
ROJul 24, 2023
simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objectsMaria Bauza, Antonia Bronars, Yifan Hou et al.
Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware grasping, visuotactile perception, and regrasp planning. Task-aware grasping computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects. Videos are available at http://mcube.mit.edu/research/simPLE.html .
CVSep 14, 2023
HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB ImageHongsuk Choi, Nikhil Chavan-Dafle, Jiacheng Yuan et al.
This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image. The inference as well as training-data generation for 3D hand-object scene reconstruction is challenging due to the depth ambiguity of a single image and occlusions by the hand and object. We turn this challenge into an opportunity by utilizing the hand shape to constrain the possible relative configuration of the hand and object geometry. We design a generalizable implicit function, HandNeRF, that explicitly encodes the correlation of the 3D hand shape features and 2D object features to predict the hand and object scene geometry. With experiments on real-world datasets, we show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods. Moreover, we demonstrate that object reconstruction from HandNeRF ensures more accurate execution of downstream tasks, such as grasping and motion planning for robotic hand-over and manipulation. Homepage: https://samsunglabs.github.io/HandNeRF-project-page/
CVNov 8, 2023
VioLA: Aligning Videos to 2D LiDAR ScansJun-Jee Chao, Selim Engin, Nikhil Chavan-Dafle et al.
We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the reconstructed semantic map may not contain sufficient information for registration. To address this problem, VioLA makes use of a pre-trained text-to-image inpainting model paired with a depth completion model for filling in the missing scene content in a geometrically consistent fashion to support pose registration. We evaluate VioLA on two real-world RGB-D benchmarks, as well as a self-captured dataset of a large office scene. Notably, our proposed scene completion module improves the pose registration performance by up to 20%.
CVDec 14, 2023
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control InjectionHongsuk Choi, Isaac Kasahara, Selim Engin et al.
Recently introduced ControlNet has the ability to steer the text-driven image generation process with geometric input such as human 2D pose, or edge features. While ControlNet provides control over the geometric form of the instances in the generated image, it lacks the capability to dictate the visual appearance of each instance. We present FineControlNet to provide fine control over each instance's appearance while maintaining the precise pose control capability. Specifically, we develop and demonstrate FineControlNet with geometric control via human pose images and appearance control via instance-level text prompts. The spatial alignment of instance-specific text prompts and 2D poses in latent space enables the fine control capabilities of FineControlNet. We evaluate the performance of FineControlNet with rigorous comparison against state-of-the-art pose-conditioned text-to-image diffusion models. FineControlNet achieves superior performance in generating images that follow the user-provided instance-specific text prompts and poses compared with existing methods. Project webpage: https://samsunglabs.github.io/FineControlNet-project-page
ROMay 16, 2023
Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp PredictionShubham Agrawal, Nikhil Chavan-Dafle, Isaac Kasahara et al.
Robotic manipulation systems operating in complex environments rely on perception systems that provide information about the geometry (pose and 3D shape) of the objects in the scene along with other semantic information such as object labels. This information is then used for choosing the feasible grasps on relevant objects. In this paper, we present a novel method to provide this geometric and semantic information of all objects in the scene as well as feasible grasps on those objects simultaneously. The main advantage of our method is its speed as it avoids sequential perception and grasp planning steps. With detailed quantitative analysis, we show that our method delivers competitive performance compared to the state-of-the-art dedicated methods for object shape, pose, and grasp predictions while providing fast inference at 30 frames per second speed.
ROSep 14, 2021
Simultaneous Object Reconstruction and Grasp Prediction using a Camera-centric Object Shell RepresentationNikhil Chavan-Dafle, Sergiy Popovych, Shubham Agrawal et al.
Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "exit image". We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.
ROSep 29, 2018
In-Hand Manipulation via Motion ConesNikhil Chavan-Dafle, Rachel Holladay, Alberto Rodriguez
In this paper, we present the mechanics and algorithms to compute the set of feasible motions of an object pushed in a plane. This set is known as the motion cone and was previously described for non-prehensile manipulation tasks in the horizontal plane. We generalize its geometric construction to a broader set of planar tasks, where external forces such as gravity influence the dynamics of pushing, and prehensile tasks, where there are complex interactions between the gripper, object, and pusher. We show that the motion cone is defined by a set of low-curvature surfaces and provide a polyhedral cone approximation to it. We verify its validity with 2000 pushing experiments recorded with motion tracking system. Motion cones abstract the algebra involved in simulating frictional pushing by providing bounds on the set of feasible motions and by characterizing which pushes will stick or slip. We demonstrate their use for the dynamic propagation step in a sampling-based planning algorithm for in-hand manipulation. The planner generates trajectories that involve sequences of continuous pushes with 5-1000x speed improvements to equivalent algorithms. Video Summary -- https://youtu.be/tVDO8QMuYhc
ROSep 23, 2018
Regrasping by Fixtureless FixturingNikhil Chavan-Dafle, Alberto Rodriguez
This paper presents a fixturing strategy for regrasping that does not require a physical fixture. To regrasp an object in a gripper, a robot pushes the object against external contact/s in the environment such that the external contact keeps the object stationary while the fingers slide over the object. We call this manipulation technique fixtureless fixturing. Exploiting the mechanics of pushing, we characterize a convex polyhedral set of pushes that results in fixtureless fixturing. These pushes are robust against uncertainty in the object inertia, grasping force, and the friction at the contacts. We propose a sampling-based planner that uses the sets of robust pushes to rapidly build a tree of reachable grasps. A path in this tree is a pushing strategy, possibly involving pushes from different sides, to regrasp the object. We demonstrate the experimental validity and robustness of the proposed manipulation technique with different regrasp examples on a manipulation platform. Such a fast and flexible regrasp planner facilitates versatile and flexible automation solutions.
ROSep 22, 2018
Pneumatic Shape-shifting Fingers to Reorient and GraspNikhil Chavan-Dafle, Kyubin Lee, Alberto Rodriguez
We present pneumatic shape-shifting fingers to enable a simple parallel-jaw gripper for different manipulation modalities. By changing the finger geometry, the gripper effectively changes the contact type between the fingers and an object to facilitate distinct manipulation primitives. In this paper, we demonstrate the development and application of shape-shifting fingers to reorient and grasp cylindrical objects. The shape of the fingers changes based on the air pressure inside them and attains two distinct geometric forms at high and low pressure values. In our implementation, the finger shape switches between a wedge-shaped geometry and V-shaped geometry at high and low pressure, respectively. Using the wedge-shaped geometry, the fingers provide a point contact on a cylindrical object to pivot it to a vertical pose under the effect of gravity. By changing to V-shaped geometry, the fingers localize the object in the vertical pose and securely hold it. Experimental results show that the smooth transition between the two contact types allows a robot with a simple gripper to reorient a cylindrical object lying horizontally on a ground and to grasp it in a vertical pose.
ROOct 30, 2017
Stable Prehensile Pushing: In-Hand Manipulation with Alternating Sticking ContactsNikhil Chavan-Dafle, Alberto Rodriguez
This paper presents an approach to in-hand manipulation planning that exploits the mechanics of alternating sticking contact. Particularly, we consider the problem of manipulating a grasped object using external pushes for which the pusher sticks to the object. Given the physical properties of the object, frictional coefficients at contacts and a desired regrasp on the object, we propose a sampling-based planning framework that builds a pushing strategy concatenating different feasible stable pushes to achieve the desired regrasp. An efficient dynamics formulation allows us to plan in-hand manipulations 100-1000 times faster than our previous work which builds upon a complementarity formulation. Experimental observations for the generated plans show that the object precisely moves in the grasp as expected by the planner. Video Summary -- youtu.be/qOTKRJMx6Ho
ROJul 2, 2017
Sampling-based Planning of In-Hand Manipulation with External PushesNikhil Chavan-Dafle, Alberto Rodriguez
This paper presents a sampling-based planning algorithm for in-hand manipulation of a grasped object using a series of external pushes. A high-level sampling-based planning framework, in tandem with a low-level inverse contact dynamics solver, effectively explores the space of continuous pushes with discrete pusher contact switch-overs. We model the frictional interaction between gripper, grasped object, and pusher, by discretizing complex surface/line contacts into arrays of hard frictional point contacts. The inverse dynamics problem of finding an instantaneous pusher motion that yields a desired instantaneous object motion takes the form of a mixed nonlinear complementarity problem. Building upon this dynamics solver, our planner generates a sequence of pushes that steers the object to a goal grasp. We evaluate the performance of the planner for the case of a parallel-jaw gripper manipulating different objects, both in simulation and with real experiments. Through these examples, we highlight the important properties of the planner: respecting and exploiting the hybrid dynamics of contact sticking/sliding/rolling and a sense of efficiency with respect to discrete contact switch-overs.
ROFeb 6, 2017
Experimental Validation of Contact Dynamics for In-Hand ManipulationRoman Kolbert, Nikhil Chavan-Dafle, Alberto Rodriguez
This paper evaluates state-of-the-art contact models at predicting the motions and forces involved in simple in-hand robotic manipulations. In particular it focuses on three primitive actions --linear sliding, pivoting, and rolling-- that involve contacts between a gripper, a rigid object, and their environment. The evaluation is done through thousands of controlled experiments designed to capture the motion of object and gripper, and all contact forces and torques at 250Hz. We demonstrate that a contact modeling approach based on Coulomb's friction law and maximum energy principle is effective at reasoning about interaction to first order, but limited for making accurate predictions. We attribute the major limitations to 1) the non-uniqueness of force resolution inherent to grasps with multiple hard contacts of complex geometries, 2) unmodeled dynamics due to contact compliance, and 3) unmodeled geometries dueto manufacturing defects.
ROApr 13, 2016
A Summary of Team MIT's Approach to the Amazon Picking Challenge 2015Kuan-Ting Yu, Nima Fazeli, Nikhil Chavan-Dafle et al.
The Amazon Picking Challenge (APC), held alongside the International Conference on Robotics and Automation in May 2015 in Seattle, challenged roboticists from academia and industry to demonstrate fully automated solutions to the problem of picking objects from shelves in a warehouse fulfillment scenario. Packing density, object variability, speed, and reliability are the main complexities of the task. The picking challenge serves both as a motivation and an instrument to focus research efforts on a specific manipulation problem. In this document, we describe Team MIT's approach to the competition, including design considerations, contributions, and performance, and we compile the lessons learned. We also describe what we think are the main remaining challenges.