Siyuan Dong

h-index62

23papers

2,467citations

Novelty48%

AI Score41

Ranked #90,905 of 201,326 authors (top 45%)#3,048 in RO (top 40%)

23 Papers

ROOct 17, 2022Code

Neural Contact Fields: Tracking Extrinsic Contact with Tactile Sensing

Carolina Higuera, Siyuan Dong, Byron Boots et al.

We present Neural Contact Fields, a method that brings together neural fields and tactile sensing to address the problem of tracking extrinsic contact between object and environment. Knowing where the external contact occurs is a first step towards methods that can actively control it in facilitating downstream manipulation tasks. Prior work for localizing environmental contacts typically assume a contact type (e.g. point or line), does not capture contact/no-contact transitions, and only works with basic geometric-shaped objects. Neural Contact Fields are the first method that can track arbitrary multi-modal extrinsic contacts without making any assumptions about the contact type. Our key insight is to estimate the probability of contact for any 3D point in the latent space of object shapes, given vision-based tactile inputs that sense the local motion resulting from the external contact. In experiments, we find that Neural Contact Fields are able to localize multiple contact patches without making any assumptions about the geometry of the contact, and capture contact/no-contact transitions for known categories of objects with unseen shapes in unseen environment configurations. In addition to Neural Contact Fields, we also release our YCB-Extrinsic-Contact dataset of simulated extrinsic contact interactions to enable further research in this area. Project page: https://github.com/carolinahiguera/NCF

CVJun 3, 2022

Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation

Chenyu You, Jinlin Xiang, Kun Su et al.

Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, which achieve competitive performance on average but such methods rely on the assumption about the availability of all training data, thus limiting its effectiveness in practical deployment. In this paper, we propose a novel multi-site segmentation framework called incremental-transfer learning (ITL), which learns a model from multi-site datasets in an end-to-end sequential fashion. Specifically, "incremental" refers to training sequentially constructed datasets, and "transfer" is achieved by leveraging useful information from the linear combination of embedding features on each dataset. In addition, we introduce our ITL framework, where we train the network including a site-agnostic encoder with pre-trained weights and at most two segmentation decoder heads. We also design a novel site-level incremental loss in order to generalize well on the target domain. Second, we show for the first time that leveraging our ITL training scheme is able to alleviate challenging catastrophic forgetting problems in incremental learning. We conduct experiments using five challenging benchmark datasets to validate the effectiveness of our incremental-transfer learning approach. Our approach makes minimal assumptions on computation resources and domain-specific expertise, and hence constitutes a strong starting point in multi-site medical image segmentation.

IVJun 17, 2022

Multi-scale Super-resolution Magnetic Resonance Spectroscopic Imaging with Adjustable Sharpness

Siyuan Dong, Gilbert Hangel, Wolfgang Bogner et al.

Magnetic Resonance Spectroscopic Imaging (MRSI) is a valuable tool for studying metabolic activities in the human body, but the current applications are limited to low spatial resolutions. The existing deep learning-based MRSI super-resolution methods require training a separate network for each upscaling factor, which is time-consuming and memory inefficient. We tackle this multi-scale super-resolution problem using a Filter Scaling strategy that modulates the convolution filters based on the upscaling factor, such that a single network can be used for various upscaling factors. Observing that each metabolite has distinct spatial characteristics, we also modulate the network based on the specific metabolite. Furthermore, our network is conditioned on the weight of adversarial loss so that the perceptual sharpness of the super-resolved metabolic maps can be adjusted within a single network. We incorporate these network conditionings using a novel Multi-Conditional Module. The experiments were carried out on a 1H-MRSI dataset from 15 high-grade glioma patients. Results indicate that the proposed network achieves the best performance among several multi-scale super-resolution methods and can provide super-resolved metabolic maps with adjustable sharpness.

IVJul 20, 2022

Flow-based Visual Quality Enhancer for Super-resolution Magnetic Resonance Spectroscopic Imaging

Siyuan Dong, Gilbert Hangel, Eric Z. Chen et al.

Magnetic Resonance Spectroscopic Imaging (MRSI) is an essential tool for quantifying metabolites in the body, but the low spatial resolution limits its clinical applications. Deep learning-based super-resolution methods provided promising results for improving the spatial resolution of MRSI, but the super-resolved images are often blurry compared to the experimentally-acquired high-resolution images. Attempts have been made with the generative adversarial networks to improve the image visual quality. In this work, we consider another type of generative model, the flow-based model, of which the training is more stable and interpretable compared to the adversarial networks. Specifically, we propose a flow-based enhancer network to improve the visual quality of super-resolution MRSI. Different from previous flow-based models, our enhancer network incorporates anatomical information from additional image modalities (MRI) and uses a learnable base distribution. In addition, we impose a guide loss and a data-consistency loss to encourage the network to generate images with high visual quality while maintaining high fidelity. Experiments on a 1H-MRSI dataset acquired from 25 high-grade glioma patients indicate that our enhancer network outperforms the adversarial networks and the baseline flow-based methods. Our method also allows visual quality adjustment and uncertainty estimation.

IVJun 6, 2022

Invertible Sharpening Network for MRI Reconstruction Enhancement

Siyuan Dong, Eric Z. Chen, Lin Zhao et al.

High-quality MRI reconstruction plays a critical role in clinical applications. Deep learning-based methods have achieved promising results on MRI reconstruction. However, most state-of-the-art methods were designed to optimize the evaluation metrics commonly used for natural images, such as PSNR and SSIM, whereas the visual quality is not primarily pursued. Compared to the fully-sampled images, the reconstructed images are often blurry, where high-frequency features might not be sharp enough for confident clinical diagnosis. To this end, we propose an invertible sharpening network (InvSharpNet) to improve the visual quality of MRI reconstructions. During training, unlike the traditional methods that learn to map the input data to the ground truth, InvSharpNet adapts a backward training strategy that learns a blurring transform from the ground truth (fully-sampled image) to the input data (blurry reconstruction). During inference, the learned blurring transform can be inverted to a sharpening transform leveraging the network's invertibility. The experiments on various MRI datasets demonstrate that InvSharpNet can improve reconstruction sharpness with few artifacts. The results were also evaluated by radiologists, indicating better visual quality and diagnostic confidence of our proposed method.

IVSep 8, 2023

Preserved Edge Convolutional Neural Network for Sensitivity Enhancement of Deuterium Metabolic Imaging (DMI)

Siyuan Dong, Henk M. De Feyter, Monique A. Thomas et al.

Purpose: Common to most MRSI techniques, the spatial resolution and the minimal scan duration of Deuterium Metabolic Imaging (DMI) are limited by the achievable SNR. This work presents a deep learning method for sensitivity enhancement of DMI. Methods: A convolutional neural network (CNN) was designed to estimate the 2H-labeled metabolite concentrations from low SNR and distorted DMI FIDs. The CNN was trained with synthetic data that represent a range of SNR levels typically encountered in vivo. The estimation precision was further improved by fine-tuning the CNN with MRI-based edge-preserving regularization for each DMI dataset. The proposed processing method, PReserved Edge ConvolutIonal neural network for Sensitivity Enhanced DMI (PRECISE-DMI), was applied to simulation studies and in vivo experiments to evaluate the anticipated improvements in SNR and investigate the potential for inaccuracies. Results: PRECISE-DMI visually improved the metabolic maps of low SNR datasets, and quantitatively provided higher precision than the standard Fourier reconstruction. Processing of DMI data acquired in rat brain tumor models resulted in more precise determination of 2H-labeled lactate and glutamate + glutamine levels, at increased spatial resolution (from >8 to 2 $μ$L) or shortened scan time (from 32 to 4 min) compared to standard acquisitions. However, rigorous SD-bias analyses showed that overuse of the edge-preserving regularization can compromise the accuracy of the results. Conclusion: PRECISE-DMI allows a flexible trade-off between enhancing the sensitivity of DMI and minimizing the inaccuracies. With typical settings, the DMI sensitivity can be improved by 3-fold while retaining the capability to detect local signal variations.

ROMar 23, 2021Code

GelSlim3.0: High-Resolution Measurement of Shape, Force and Slip in a Compact Tactile-Sensing Finger

Ian Taylor, Siyuan Dong, Alberto Rodriguez

This work presents a new version of the tactile-sensing finger GelSlim 3.0, which integrates the ability to sense high-resolution shape, force, and slip in a compact form factor for use with small parallel jaw grippers in cluttered bin-picking scenarios. The novel design incorporates the capability to use real-time analytic methods to measure shape, estimate the contact 3D force distribution, and detect incipient slip. To achieve a compact integration, we optimize the optical path from illumination source to camera and other geometric variables in a optical simulation environment. In particular, we optimize the illumination sources and a light shaping lens around the constraints imposed by the photometric stereo algorithm used for depth reconstruction. The optimized optical configuration is integrated into a finger design composed of robust and easily replaceable snap-to-fit fingetip module that allow for ease of manufacture, assembly, use, and repair. To stimulate future research in tactile-sensing and provide the robotics community access to reliable and easily-reproducible tactile finger with a diversity of sensing modalities, we open-source the design and software at https://github.com/mcubelab/gelslim.

IVOct 25, 2024

A Flow-based Truncated Denoising Diffusion Model for Super-resolution Magnetic Resonance Spectroscopic Imaging

Siyuan Dong, Zhuotong Cai, Gilbert Hangel et al.

Magnetic Resonance Spectroscopic Imaging (MRSI) is a non-invasive imaging technique for studying metabolism and has become a crucial tool for understanding neurological diseases, cancers and diabetes. High spatial resolution MRSI is needed to characterize lesions, but in practice MRSI is acquired at low resolution due to time and sensitivity restrictions caused by the low metabolite concentrations. Therefore, there is an imperative need for a post-processing approach to generate high-resolution MRSI from low-resolution data that can be acquired fast and with high sensitivity. Deep learning-based super-resolution methods provided promising results for improving the spatial resolution of MRSI, but they still have limited capability to generate accurate and high-quality images. Recently, diffusion models have demonstrated superior learning capability than other generative models in various tasks, but sampling from diffusion models requires iterating through a large number of diffusion steps, which is time-consuming. This work introduces a Flow-based Truncated Denoising Diffusion Model (FTDDM) for super-resolution MRSI, which shortens the diffusion process by truncating the diffusion chain, and the truncated steps are estimated using a normalizing flow-based network. The network is conditioned on upscaling factors to enable multi-scale super-resolution. To train and evaluate the deep learning models, we developed a 1H-MRSI dataset acquired from 25 high-grade glioma patients. We demonstrate that FTDDM outperforms existing generative models while speeding up the sampling process by over 9-fold compared to the baseline diffusion model. Neuroradiologists' evaluations confirmed the clinical advantages of our method, which also supports uncertainty estimation and sharpness adjustment, extending its potential clinical applications.

CVNov 24, 2025

Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving

Jianhua Han, Meng Tian, Jiangtong Zhu et al.

Autonomous driving heavily relies on accurate and robust spatial perception. Many failures arise from inaccuracies and instability, especially in long-tail scenarios and complex interactions. However, current vision-language models are weak at spatial grounding and understanding, and VLA systems built on them therefore show limited perception and localization ability. To address these challenges, we introduce Percept-WAM, a perception-enhanced World-Awareness-Action Model that is the first to implicitly integrate 2D/3D scene understanding abilities within a single vision-language model (VLM). Instead of relying on QA-style spatial reasoning, Percept-WAM unifies 2D/3D perception tasks into World-PV and World-BEV tokens, which encode both spatial coordinates and confidence. We propose a grid-conditioned prediction mechanism for dense object perception, incorporating IoU-aware scoring and parallel autoregressive decoding, improving stability in long-tail, far-range, and small-object scenarios. Additionally, Percept-WAM leverages pretrained VLM parameters to retain general intelligence (e.g., logical reasoning) and can output perception results and trajectory control outputs directly. Experiments show that Percept-WAM matches or surpasses classical detectors and segmenters on downstream perception benchmarks, achieving 51.7/58.9 mAP on COCO 2D detection and nuScenes BEV 3D detection. When integrated with trajectory decoders, it further improves planning performance on nuScenes and NAVSIM, e.g., surpassing DiffusionDrive by 2.1 in PMDS on NAVSIM. Qualitative results further highlight its strong open-vocabulary and long-tail generalization.

ROMar 31, 2022

Visual-Tactile Multimodality for Following Deformable Linear Objects Using Reinforcement Learning

Leszek Pecyna, Siyuan Dong, Shan Luo

Manipulation of deformable objects is a challenging task for a robot. It will be problematic to use a single sensory input to track the behaviour of such objects: vision can be subjected to occlusions, whereas tactile inputs cannot capture the global information that is useful for the task. In this paper, we study the problem of using vision and tactile inputs together to complete the task of following deformable linear objects, for the first time. We create a Reinforcement Learning agent using different sensing modalities and investigate how its behaviour can be boosted using visual-tactile fusion, compared to using a single sensing modality. To this end, we developed a benchmark in simulation for manipulating the deformable linear objects using multimodal sensing inputs. The policy of the agent uses distilled information, e.g., the pose of the object in both visual and tactile perspectives, instead of the raw sensing signals, so that it can be directly transferred to real environments. In this way, we disentangle the perception system and the learned control policy. Our extensive experiments show that the use of both vision and tactile inputs, together with proprioception, allows the agent to complete the task in up to 92% of cases, compared to 77% when only one of the signals is given. Our results can provide valuable insights for the future design of tactile sensors and for deformable objects manipulation.

CVJan 26, 2022

Class-Aware Adversarial Transformers for Medical Image Segmentation

Chenyu You, Ruihan Zhao, Fenglin Liu et al.

Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale feature representations; and (3) the segmentation label maps generated by the models are not accurate enough without considering rich semantic contexts and anatomical textures. In this work, we present CASTformer, a novel type of adversarial transformers, for 2D medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures. Lastly, we utilize an adversarial training strategy that boosts segmentation accuracy and correspondingly allows a transformer-based discriminator to capture high-level semantically correlated contents and low-level anatomical features. Our experiments demonstrate that CASTformer dramatically outperforms previous state-of-the-art transformer-based approaches on three benchmarks, obtaining 2.54%-5.88% absolute improvements in Dice over previous models. Further qualitative experiments provide a more detailed picture of the model's inner workings, shed light on the challenges in improved transparency, and demonstrate that transfer learning can greatly improve performance and reduce the size of medical image datasets in training, making CASTformer a strong starting point for downstream medical image analysis tasks.

CRAug 18, 2021

A Review on Cybersecurity in Smart Local Energy Systems: Requirements, Challenges, and Standards

Siyuan Dong, Jun Cao, Zhong Fan

Smart local energy system (SLES) is considered as a promising pathway facilitating a more effective and localised operation, benefited from the complex information and communication technology (ICT) infrastructures and Internet of things (IoT) technologies. As a part of the critical infrastructure, it is important to not only put effective detection and management to tackle potential cybersecurity issues, but also require considerable numbers of standards to ensure the security of the internet of things system to minimise the risks. This study aims to review the existing standards, investigate how the compatibility with SLES development, and identify the area to focus on in the future. Although existing standards and protocols are highly fragmented, our findings suggest that many of them can meet the requirements of the applications and infrastructures of SLES. Additionally, many standards have been introduced to protect information security and personal privacy due to their increasing importance. The research also suggests that the industry needs to produce more affordable and cyber-secured devices and services. For the government and regulators, relevant guidelines on the minimum function and security requirements for applications should be provided. Additionally, compliance testing and certifications should be in place and carried out by an independent third party to ensure the components of SLES ecosystem with a satisfied security level by design.

ROApr 2, 2021

Tactile-RL for Insertion: Generalization to Objects of Unknown Geometry

Siyuan Dong, Devesh K. Jha, Diego Romeres et al.

Object insertion is a classic contact-rich manipulation task. The task remains challenging, especially when considering general objects of unknown geometry, which significantly limits the ability to understand the contact configuration between the object and the environment. We study the problem of aligning the object and environment with a tactile-based feedback insertion policy. The insertion process is modeled as an episodic policy that iterates between insertion attempts followed by pose corrections. We explore different mechanisms to learn such a policy based on Reinforcement Learning. The key contribution of this paper is to demonstrate that it is possible to learn a tactile insertion policy that generalizes across different object geometries, and an ablation study of the key design choices for the learning agent: 1) the type of learning scheme: supervised vs. reinforcement learning; 2) the type of learning schedule: unguided vs. curriculum learning; 3) the type of sensing modality: force/torque (F/T) vs. tactile; and 4) the type of tactile representation: tactile RGB vs. tactile flow. We show that the optimal configuration of the learning agent (RL + curriculum + tactile flow) exposed to 4 training objects yields an insertion policy that inserts 4 novel objects with over 85.0% success rate and within 3~4 attempts. Comparisons between F/T and tactile sensing, shows that while an F/T-based policy learns more efficiently, a tactile-based policy provides better generalization.

ROMar 15, 2021

Extrinsic Contact Sensing with Relative-Motion Tracking from Distributed Tactile Measurements

Daolin Ma, Siyuan Dong, Alberto Rodriguez

This paper addresses the localization of contacts of an unknown grasped rigid object with its environment, i.e., extrinsic to the robot. We explore the key role that distributed tactile sensing plays in localizing contacts external to the robot, in contrast to the role that aggregated force/torque measurements play in localizing contacts on the robot. When in contact with the environment, an object will move in accordance with the kinematic and possibly frictional constraints imposed by that contact. Small motions of the object, which are observable with tactile sensors, indirectly encode those constraints and the geometry that defines them. We formulate the extrinsic contact sensing problem as a constraint-based estimation. The estimation is subject to the kinematic constraints imposed by the tactile measurements of object motion, as well as the kinematic (e.g., non-penetration) and possibly frictional (e.g., sticking) constraints imposed by rigid-body mechanics. We validate the approach in simulation and with real experiments on the case studies of fixed point and line contacts. This paper discusses the theoretical basis for the value of distributed tactile sensing in contrast to aggregated force/torque measurements. It also provides an estimation framework for localizing environmental contacts with potential impact in contact-rich manipulation scenarios such as assembling or packing.

ROFeb 8, 2020

Tactile Dexterity: Manipulation Primitives with Tactile Feedback

Francois R. Hogan, Jose Ballester, Siyuan Dong et al.

This paper develops closed-loop tactile controllers for dexterous robotic manipulation with a dual-palm robotic system. Tactile dexterity is an approach to dexterous manipulation that plans for robot/object interactions that render interpretable tactile information for control. We divide the role of tactile control into two goals: 1) control the contact state between the end-effector and the object (contact/no-contact, stick/slip) by regulating the stability of planned contact configurations and monitoring undesired slip events; and 2) control the object state by tactile-based tracking and iterative replanning of the object and robot trajectories. Key to this formulation is the decomposition of manipulation plans into sequences of manipulation primitives with simple mechanics and efficient planners. We consider the scenario of manipulating an object from an initial pose to a target pose on a flat surface while correcting for external perturbations and uncertainty in the initial pose of the object. We experimentally validate the approach with an ABB YuMi dual-arm robot and demonstrate the ability of the tactile controller to react to external perturbations.

ROOct 3, 2019

Cable Manipulation with a Tactile-Reactive Gripper

Yu She, Shaoxiong Wang, Siyuan Dong et al.

Cables are complex, high dimensional, and dynamic objects. Standard approaches to manipulate them often rely on conservative strategies that involve long series of very slow and incremental deformations, or various mechanical fixtures such as clamps, pins or rings. We are interested in manipulating freely moving cables, in real time, with a pair of robotic grippers, and with no added mechanical constraints. The main contribution of this paper is a perception and control framework that moves in that direction, and uses real-time tactile feedback to accomplish the task of following a dangling cable. The approach relies on a vision-based tactile sensor, GelSight, that estimates the pose of the cable in the grip, and the friction forces during cable sliding. We achieve the behavior by combining two tactile-based controllers: 1) Cable grip controller, where a PD controller combined with a leaky integrator regulates the gripping force to maintain the frictional sliding forces close to a suitable value; and 2) Cable pose controller, where an LQR controller based on a learned linear model of the cable sliding dynamics keeps the cable centered and aligned on the fingertips to prevent the cable from falling from the grip. This behavior is possible by a reactive gripper fitted with GelSight-based high-resolution tactile sensors. The robot can follow one meter of cable in random configurations within 2-3 hand regrasps, adapting to cables of different materials and thicknesses. We demonstrate a robot grasping a headphone cable, sliding the fingers to the jack connector, and inserting it. To the best of our knowledge, this is the first implementation of real-time cable following without the aid of mechanical fixtures.

ROSep 12, 2019

Tactile-Based Insertion for Dense Box-Packing

Siyuan Dong, Alberto Rodriguez

We study the problem of using high-resolution tactile sensors to control the insertion of objects in a box-packing scenario. We propose a new system based on a tactile sensor GelSlim for the dense packing task. In this paper, we propose an insertion strategy that leverages tactile sensing to: 1) safely probe the box with the grasped object while monitoring incipient slip to maintain a stable grasp on the object. 2) estimate and correct for residual position uncertainties to insert the object into a designated gap without disturbing the environment. Our proposed methodology is based on two neural networks that estimate the error direction and error magnitude, from a stream of tactile imprints, acquired by two GelSlim fingers, during the insertion process. The system is trained on four objects with basic geometric shapes, which we show generalizes to four other common objects. Based on the estimated positional errors, a heuristic controller iteratively adjusts the position of the object and eventually inserts it successfully without requiring prior knowledge of the geometry of the object. The key insight is that dense tactile feedback contains useful information with respect to the contact interaction between the grasped object and its environment. We achieve high success rate and show that unknown objects can be inserted with an average of 6 attempts of the probe-correct loop. The method's ability to generalize to novel objects makes it a good fit for box packing in warehouse automation.

ROOct 31, 2018

Maintaining Grasps within Slipping Bound by Monitoring Incipient Slip

Siyuan Dong, Daolin Ma, Elliott Donlon et al.

In this paper, we propose an approach to detect incipient slip, i.e. predict slip, by using a high-resolution vision-based tactile sensor, GelSlim. The sensor dynamically captures the tactile imprints of the contact object and their changes with a soft gel pad. The method assumes the object is mostly rigid and treats the motion of object's imprint on sensor surface as a 2D rigid-body motion. We use the deviation of the true motion field from that of a 2D planar rigid transformation as a measure of slip. The output is a dense slip field which we use to detect when small areas of the contact patch start to slip (incipient slip). The method can detect both translational and rotational incipient slip without any prior knowledge of the object at 24 Hz. We test the method on 10 objects 240 times and achieve 86.25% detection accuracy. We further show how the slip feedback can be used to monitor the gripping force to avoid slip with a closed-loop bottle-cap screwing and unscrewing experiment with incipient slip detection feedback. The method was demonstrated to be useful for the robot to apply proper gripping force and stop screwing at the right point before breaking objects. The method can be applied to many manipulation tasks in both structured and unstructured environments.

ROOct 10, 2018

Dense Tactile Force Distribution Estimation using GelSlim and inverse FEM

Daolin Ma, Elliott Donlon, Siyuan Dong et al.

In this paper, we present a new version of tactile sensor GelSlim 2.0 with the capability to estimate the contact force distribution in real time. The sensor is vision-based and uses an array of markers to track deformations on a gel pad due to contact. A new hardware design makes the sensor more rugged, parametrically adjustable and improves illumination. Leveraging the sensor's increased functionality, we propose to use inverse Finite Element Method (iFEM), a numerical method to reconstruct the contact force distribution based on marker displacements. The sensor is able to provide force distribution of contact with high spatial density. Experiments and comparison with ground truth show that the reconstructed force distribution is physically reasonable with good accuracy.

ROMar 1, 2018

GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing Finger

Elliott Donlon, Siyuan Dong, Melody Liu et al.

This work describes the development of a high-resolution tactile-sensing finger for robot grasping. This finger, inspired by previous GelSight sensing techniques, features an integration that is slimmer, more robust, and with more homogeneous output than previous vision-based tactile sensors. To achieve a compact integration, we redesign the optical path from illumination source to camera by combining light guides and an arrangement of mirror reflections. We parameterize the optical path with geometric design variables and describe the tradeoffs between the finger thickness, the depth of field of the camera, and the size of the tactile sensing area. The sensor sustains the wear from continuous use -- and abuse -- in grasping tasks by combining tougher materials for the compliant soft gel, a textured fabric skin, a structurally rigid body, and a calibration process that maintains homogeneous illumination and contrast of the tactile images during use. Finally, we evaluate the sensor's durability along four metrics that track the signal quality during more than 3000 grasping experiments.

ROFeb 27, 2018

Slip Detection with Combined Tactile and Visual Information

Jianhua Li, Siyuan Dong, Edward Adelson

Slip detection plays a vital role in robotic manipulation and it has long been a challenging problem in the robotic community. In this paper, we propose a new method based on deep neural network (DNN) to detect slip. The training data is acquired by a GelSight tactile sensor and a camera mounted on a gripper when we use a robot arm to grasp and lift 94 daily objects with different grasping forces and grasping positions. The DNN is trained to classify whether a slip occurred or not. To evaluate the performance of the DNN, we test 10 unseen objects in 152 grasps. A detection accuracy as high as 88.03% is achieved. It is anticipated that the accuracy can be further improved with a larger dataset. This method is beneficial for robots to make stable grasps, which can be widely applied to automatic force control, grasping strategy selection and fine manipulation.

ROAug 2, 2017

Improved GelSight Tactile Sensor for Measuring Geometry and Slip

Siyuan Dong, Wenzhen Yuan, Edward Adelson

A GelSight sensor uses an elastomeric slab covered with a reflective membrane to measure tactile signals. It measures the 3D geometry and contact force information with high spacial resolution, and successfully helped many challenging robot tasks. A previous sensor, based on a semi-specular membrane, produces high resolution but with limited geometry accuracy. In this paper, we describe a new design of GelSight for robot gripper, using a Lambertian membrane and new illumination system, which gives greatly improved geometric accuracy while retaining the compact size. We demonstrate its use in measuring surface normals and reconstructing height maps using photometric stereo. We also use it for the task of slip detection, using a combination of information about relative motions on the membrane surface and the shear distortions. Using a robotic arm and a set of 37 everyday objects with varied properties, we find that the sensor can detect translational and rotational slip in general cases, and can be used to improve the stability of the grasp.

CVApr 12, 2017

Connecting Look and Feel: Associating the visual and tactile properties of physical materials

Wenzhen Yuan, Shaoxiong Wang, Siyuan Dong et al.

For machines to interact with the physical world, they must understand the physical properties of objects and materials they encounter. We use fabrics as an example of a deformable material with a rich set of mechanical properties. A thin flexible fabric, when draped, tends to look different from a heavy stiff fabric. It also feels different when touched. Using a collection of 118 fabric sample, we captured color and depth images of draped fabrics along with tactile data from a high resolution touch sensor. We then sought to associate the information from vision and touch by jointly training CNNs across the three modalities. Through the CNN, each input, regardless of the modality, generates an embedding vector that records the fabric's physical property. By comparing the embeddings, our system is able to look at a fabric image and predict how it will feel, and vice versa. We also show that a system jointly trained on vision and touch data can outperform a similar system trained only on visual data when tested purely with visual inputs.