Shan Luo

h-index26

26papers

1,307citations

Novelty45%

AI Score43

Ranked #56,880 of 194,257 authors (top 29%)#1,630 in RO (top 24%)

26 Papers

7.3CVSep 11, 2022Code

Inverse Image Frequency for Long-tailed Image Recognition

Konstantinos Panagiotis Alexandridis, Shan Luo, Anh Nguyen et al.

The long-tailed distribution is a common phenomenon in the real world. Extracted large scale image datasets inevitably demonstrate the long-tailed property and models trained with imbalanced data can obtain high performance for the over-represented categories, but struggle for the under-represented categories, leading to biased predictions and performance degradation. To address this challenge, we propose a novel de-biasing method named Inverse Image Frequency (IIF). IIF is a multiplicative margin adjustment transformation of the logits in the classification layer of a convolutional neural network. Our method achieves stronger performance than similar works and it is especially useful for downstream tasks such as long-tailed instance segmentation as it produces fewer false positive detections. Our extensive experiments show that IIF surpasses the state of the art on many long-tailed benchmarks such as ImageNet-LT, CIFAR-LT, Places-LT and LVIS, reaching 55.8% top-1 accuracy with ResNet50 on ImageNet-LT and 26.2% segmentation AP with MaskRCNN on LVIS. Code available at https://github.com/kostas1515/iif

7.6CVOct 15, 2024Code

Fractal Calibration for long-tailed object detection

Konstantinos Panagiotis Alexandridis, Ismail Elezi, Jiankang Deng et al.

Real-world datasets follow an imbalanced distribution, which poses significant challenges in rare-category object detection. Recent studies tackle this problem by developing re-weighting and re-sampling methods, that utilise the class frequencies of the dataset. However, these techniques focus solely on the frequency statistics and ignore the distribution of the classes in image space, missing important information. In contrast to them, we propose FRActal CALibration (FRACAL): a novel post-calibration method for long-tailed object detection. FRACAL devises a logit adjustment method that utilises the fractal dimension to estimate how uniformly classes are distributed in image space. During inference, it uses the fractal dimension to inversely downweight the probabilities of uniformly spaced class predictions achieving balance in two axes: between frequent and rare categories, and between uniformly spaced and sparsely spaced classes. FRACAL is a post-processing method and it does not require any training, also it can be combined with many off-the-shelf models such as one-stage sigmoid detectors and two-stage instance segmentation models. FRACAL boosts the rare class performance by up to 8.6% and surpasses all previous methods on LVIS dataset, while showing good generalisation to other datasets such as COCO, V3Det and OpenImages. We provide the code at https://github.com/kostas1515/FRACAL.

6.5CVFeb 6, 2024

Road Surface Defect Detection -- From Image-based to Non-image-based: A Survey

Jongmin Yu, Jiaqi Jiang, Sebastiano Fichera et al.

Ensuring traffic safety is crucial, which necessitates the detection and prevention of road surface defects. As a result, there has been a growing interest in the literature on the subject, leading to the development of various road surface defect detection methods. The methods for detecting road defects can be categorised in various ways depending on the input data types or training methodologies. The predominant approach involves image-based methods, which analyse pixel intensities and surface textures to identify defects. Despite their popularity, image-based methods share the distinct limitation of vulnerability to weather and lighting changes. To address this issue, researchers have explored the use of additional sensors, such as laser scanners or LiDARs, providing explicit depth information to enable the detection of defects in terms of scale and volume. However, the exploration of data beyond images has not been sufficiently investigated. In this survey paper, we provide a comprehensive review of road surface defect detection studies, categorising them based on input data types and methodologies used. Additionally, we review recently proposed non-image-based methods and discuss several challenges and open problems associated with these techniques.

13.5ROSep 29, 2025

CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations

Zhiyuan Wu, Rolandos Alexandros Potamias, Xuyang Zhang et al.

Cross-embodiment dexterous grasp synthesis refers to adaptively generating and optimizing grasps for various robotic hands with different morphologies. This capability is crucial for achieving versatile robotic manipulation in diverse environments and requires substantial amounts of reliable and diverse grasp data for effective model training and robust generalization. However, existing approaches either rely on physics-based optimization that lacks human-like kinematic understanding or require extensive manual data collection processes that are limited to anthropomorphic structures. In this paper, we propose CEDex, a novel cross-embodiment dexterous grasp synthesis method at scale that bridges human grasping kinematics and robot kinematics by aligning robot kinematic models with generated human-like contact representations. Given an object's point cloud and an arbitrary robotic hand model, CEDex first generates human-like contact representations using a Conditional Variational Auto-encoder pretrained on human contact data. It then performs kinematic human contact alignment through topological merging to consolidate multiple human hand parts into unified robot components, followed by a signed distance field-based grasp optimization with physics-aware constraints. Using CEDex, we construct the largest cross-embodiment grasp dataset to date, comprising 500K objects across four gripper types with 20M total grasps. Extensive experiments show that CEDex outperforms state-of-the-art approaches and our dataset benefits cross-embodiment grasp learning with high-quality diverse grasps.

11.8CVJun 25, 2025

ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations

Zhiyuan Wu, Yongqiang Zhao, Shan Luo

Vision and touch are two fundamental sensory modalities for robots, offering complementary information that enhances perception and manipulation tasks. Previous research has attempted to jointly learn visual-tactile representations to extract more meaningful information. However, these approaches often rely on direct combination, such as feature addition and concatenation, for modality fusion, which tend to result in poor feature integration. In this paper, we propose ConViTac, a visual-tactile representation learning network designed to enhance the alignment of features during fusion using contrastive representations. Our key contribution is a Contrastive Embedding Conditioning (CEC) mechanism that leverages a contrastive encoder pretrained through self-supervised contrastive learning to project visual and tactile inputs into unified latent embeddings. These embeddings are used to couple visual-tactile feature fusion through cross-modal attention, aiming at aligning the unified representations and enhancing performance on downstream tasks. We conduct extensive experiments to demonstrate the superiority of ConViTac in real world over current state-of-the-art methods and the effectiveness of our proposed CEC mechanism, which improves accuracy by up to 12.0% in material classification and grasping prediction tasks.

3.7CVFeb 6, 2024

Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing

Jongmin Yu, Chen Bene Chi, Sebastiano Fichera et al.

Road pavement detection and segmentation are critical for developing autonomous road repair systems. However, developing an instance segmentation method that simultaneously performs multi-class defect detection and segmentation is challenging due to the textural simplicity of road pavement image, the diversity of defect geometries, and the morphological ambiguity between classes. We propose a novel end-to-end method for multi-class road defect detection and segmentation. The proposed method comprises multiple spatial and channel-wise attention blocks available to learn global representations across spatial and channel-wise dimensions. Through these attention blocks, more globally generalised representations of morphological information (spatial characteristics) of road defects and colour and depth information of images can be learned. To demonstrate the effectiveness of our framework, we conducted various ablation studies and comparisons with prior methods on a newly collected dataset annotated with nine road defect classes. The experiments show that our proposed method outperforms existing state-of-the-art methods for multi-class road defect detection and segmentation methods.

3.2ROApr 4, 2025

Point Cloud-based Grasping for Soft Hand Exoskeleton

Chen Hu, Enrica Tricomi, Eojin Rho et al.

Grasping is a fundamental skill for interacting with and manipulating objects in the environment. However, this ability can be challenging for individuals with hand impairments. Soft hand exoskeletons designed to assist grasping can enhance or restore essential hand functions, yet controlling these soft exoskeletons to support users effectively remains difficult due to the complexity of understanding the environment. This study presents a vision-based predictive control framework that leverages contextual awareness from depth perception to predict the grasping target and determine the next control state for activation. Unlike data-driven approaches that require extensive labelled datasets and struggle with generalizability, our method is grounded in geometric modelling, enabling robust adaptation across diverse grasping scenarios. The Grasping Ability Score (GAS) was used to evaluate performance, with our system achieving a state-of-the-art GAS of 91% across 15 objects and healthy participants, demonstrating its effectiveness across different object types. The proposed approach maintained reconstruction success for unseen objects, underscoring its enhanced generalizability compared to learning-based models.

2.2RODec 18, 2023

Generating Future Observations to Estimate Grasp Success in Cluttered Environments

Daniel Fernandes Gomes, Wenxuan Mou, Paolo Paoletti et al.

End-to-end self-supervised models have been proposed for estimating the success of future candidate grasps and video predictive models for generating future observations. However, none have yet studied these two strategies side-by-side for addressing the aforementioned grasping problem. We investigate and compare a model-free approach, to estimate the success of a candidate grasp, against a model-based alternative that exploits a self-supervised learnt predictive model that generates a future observation of the gripper about to grasp an object. Our experiments demonstrate that despite the end-to-end model-free model obtaining a best accuracy of 72%, the proposed model-based pipeline yields a significantly higher accuracy of 82%.

14.9ROMar 31, 2022

Visual-Tactile Multimodality for Following Deformable Linear Objects Using Reinforcement Learning

Leszek Pecyna, Siyuan Dong, Shan Luo

Manipulation of deformable objects is a challenging task for a robot. It will be problematic to use a single sensory input to track the behaviour of such objects: vision can be subjected to occlusions, whereas tactile inputs cannot capture the global information that is useful for the task. In this paper, we study the problem of using vision and tactile inputs together to complete the task of following deformable linear objects, for the first time. We create a Reinforcement Learning agent using different sensing modalities and investigate how its behaviour can be boosted using visual-tactile fusion, compared to using a single sensing modality. To this end, we developed a benchmark in simulation for manipulating the deformable linear objects using multimodal sensing inputs. The policy of the agent uses distilled information, e.g., the pose of the object in both visual and tactile perspectives, instead of the raw sensing signals, so that it can be directly transferred to real environments. In this way, we disentangle the perception system and the learned control policy. Our extensive experiments show that the use of both vision and tactile inputs, together with proprioception, allows the agent to complete the task in up to 92% of cases, compared to 77% when only one of the signals is given. Our results can provide valuable insights for the future design of tactile sensors and for deformable objects manipulation.

10.4RODec 28, 2021

Robotic Perception of Object Properties using Tactile Sensing

Jiaqi Jiang, Shan Luo

The sense of touch plays a key role in enabling humans to understand and interact with surrounding environments. For robots, tactile sensing is also irreplaceable. While interacting with objects, tactile sensing provides useful information for the robot to understand the object, such as distributed pressure, temperature, vibrations and texture. During robot grasping, vision is often occluded by its end-effectors, whereas tactile sensing can measure areas that are not accessible by vision. In the past decades, a number of tactile sensors have been developed for robots and used for different robotic tasks. In this chapter, we focus on the use of tactile sensing for robotic grasping and investigate the recent trends in tactile perception of object properties. We first discuss works on tactile perception of three important object properties in grasping, i.e., shape, pose and material properties. We then review the recent development in grasping stability prediction with tactile sensing. Among these works, we identify the requirement for coordinating vision and tactile sensing in the robotic grasping. To demonstrate the use of tactile sensing to improve the visual perception, our recent development of vision-guided tactile perception for crack reconstruction is presented. In the proposed framework, the large receptive field of camera vision is first leveraged to achieve a quick search of candidate regions containing cracks, a high-resolution optical tactile sensor is then used to examine these candidate regions and reconstruct a refined crack shape. The experiments show that our proposed method can achieve a significant reduction of mean distance error from 0.82 mm to 0.24 mm for crack reconstruction. Finally, we conclude this chapter with a discussion of open issues and future directions for applying tactile sensing in robotic tasks.

8.9RODec 3, 2021

Reducing Tactile Sim2Real Domain Gaps via Deep Texture Generation Networks

Tudor Jianu, Daniel Fernandes Gomes, Shan Luo

Recently simulation methods have been developed for optical tactile sensors to enable the Sim2Real learning, i.e., firstly training models in simulation before deploying them on the real robot. However, some artefacts in the real objects are unpredictable, such as imperfections caused by fabrication processes, or scratches by the natural wear and tear, and thus cannot be represented in the simulation, resulting in a significant gap between the simulated and real tactile images. To address this Sim2Real gap, we propose a novel texture generation network that maps the simulated images into photorealistic tactile images that resemble a real sensor contacting a real imperfect object. Each simulated tactile image is first divided into two types of regions: areas that are in contact with the object and areas that are not. The former is applied with generated textures learned from real textures in the real tactile images, whereas the latter maintains its appearance as when the sensor is not in contact with any object. This makes sure that the artefacts are only applied to the deformed regions of the sensor. Our extensive experiments show that the proposed texture generation network can generate these realistic artefacts on the deformed regions of the sensor, while avoiding leaking the textures into areas of no contact. Quantitative experiments further reveal that when using the adapted images generated by our proposed network for a Sim2Real classification task, the drop in accuracy caused by the Sim2Real gap is reduced from 38.43% to merely 0.81%. As such, this work has the potential to accelerate the Sim2Real learning for robotic tasks requiring tactile sensing.

8.6HCJul 12, 2021

Monoscopic vs. Stereoscopic Views and Display Types in the Teleoperation of Unmanned Ground Vehicles for Object Avoidance

Yiming Luo, Jialin Wang, Hai-Ning Liang et al.

Virtual reality (VR) head-mounted displays (HMD) have recently been used to provide an immersive, first-person vision/view in real-time for manipulating remotely-controlled unmanned ground vehicles (UGV). The teleoperation of UGV can be challenging for operators when it is done in real time. One big challenge is for operators to perceive quickly and rapidly the distance of objects that are around the UGV while it is moving. In this research, we explore the use of monoscopic and stereoscopic views and display types (immersive and non-immersive VR) for operating vehicles remotely. We conducted two user studies to explore their feasibility and advantages. Results show a significantly better performance when using an immersive display with stereoscopic view for dynamic, real-time navigation tasks that require avoiding both moving and static obstacles. The use of stereoscopic view in an immersive display in particular improved user performance and led to better usability.

12.8ROMay 13, 2021

Vision-Guided Active Tactile Perception for Crack Detection and Reconstruction

Jiaqi Jiang, Guanqun Cao, Daniel Fernandes Gomes et al.

Crack detection is of great significance for monitoring the integrity and well-being of the infrastructure such as bridges and underground pipelines, which are harsh environments for people to access. In recent years, computer vision techniques have been applied in detecting cracks in concrete structures. However, they suffer from variances in light conditions and shadows, lacking robustness and resulting in many false positives. To address the uncertainty in vision, human inspectors actively touch the surface of the structures, guided by vision, which has not been explored in autonomous crack detection. In this paper, we propose a novel approach to detect and reconstruct cracks in concrete structures using vision-guided active tactile perception. Given an RGB-D image of a structure, the rough profile of the crack in the structure surface will first be segmented with a fine-tuned Deep Convolutional Neural Networks, and a set of contact points are generated to guide the collection of tactile images by a camera-based optical tactile sensor. When contacts are made, a pixel-wise mask of the crack can be obtained from the tactile images and therefore the profile of the crack can be refined by aligning the RGB-D image and the tactile images. Extensive experiment results have shown that the proposed method improves the effectiveness and robustness of crack detection and reconstruction significantly, compared to crack detection with vision only, and has the potential to enable robots to help humans with the inspection and repair of the concrete infrastructure.

10.4ROFeb 28, 2021Code

TouchRoller: A Rolling Optical Tactile Sensor for Rapid Assessment of Large Surfaces

Guanqun Cao, Jiaqi Jiang, Chen Lu et al.

Tactile sensing is important for robots to perceive the world as it captures the texture and hardness of the object in contact and is robust to illumination and colour variances. However, due to the limited sensing area and the resistance of the fixed surface, current tactile sensors have to tap the tactile sensor on target object many times when assessing a large surface, i.e., pressing, lifting up and shifting to another region. This process is ineffective and time consuming. It is also undesirable to drag such sensors as this often damages the sensitive membrane of the sensor or the object. To address these problems, we propose a cylindrical optical tactile sensor named TouchRoller that can roll around its center axis. It maintains being in contact with the assessed surface throughout the entire motion, which allows for measuring the object continuously and effectively. Extensive experiments show that the TouchRoller sensor can cover a textured surface of 8cm*11cm in a short time of 10s, much more effectively than a flat optical tactile sensor (in 196s). The reconstructed map of the texture from the collected tactile images has a high Structural Similarity Index (SSIM) of 0.31 on average, when compared with the visual texture. In addition, the contacts on the sensor can be localised with a low localisation error, 2.63mm in the center regions and 7.66mm on average. The proposed sensor will enable the fast assessment of large surfaces with high-resolution tactile sensing, and also the effective collection of tactile images.

20.3ROJan 18, 2021Code

Generation of GelSight Tactile Images for Sim2Real Learning

Daniel Fernandes Gomes, Paolo Paoletti, Shan Luo

Most current works in Sim2Real learning for robotic manipulation tasks leverage camera vision that may be significantly occluded by robot hands during the manipulation. Tactile sensing offers complementary information to vision and can compensate for the information loss caused by the occlusion. However, the use of tactile sensing is restricted in the Sim2Real research due to no simulated tactile sensors being available. To mitigate the gap, we introduce a novel approach for simulating a GelSight tactile sensor in the commonly used Gazebo simulator. Similar to the real GelSight sensor, the simulated sensor can produce high-resolution images by an optical sensor from the interaction between the touched object and an opaque soft membrane. It can indirectly sense forces, geometry, texture and other properties of the object and enables Sim2Real learning with tactile sensing. Preliminary experimental results have shown that the simulated sensor could generate realistic outputs similar to the ones captured by a real GelSight sensor. All the materials used in this paper are available at https://danfergo.github.io/gelsight-simulation.

17.3ROAug 12, 2020

GelTip: A Finger-shaped Optical Tactile Sensor for Robotic Manipulation

Daniel Fernandes Gomes, Zhonglin Lin, Shan Luo

Sensing contacts throughout the fingers is an essential capability for a robot to perform manipulation tasks in cluttered environments. However, existing tactile sensors either only have a flat sensing surface or a compliant tip with a limited sensing area. In this paper, we propose a novel optical tactile sensor, the GelTip, that is shaped as a finger and can sense contacts on any location of its surface. The sensor captures high-resolution and color-invariant tactile images that can be exploited to extract detailed information about the end-effector's interactions against manipulated objects. Our extensive experiments show that the GelTip sensor can effectively localise the contacts on different locations of its finger-shaped body, with a small localisation error of approximately 5 mm, on average, and under 1 mm in the best cases. The obtained results show the potential of the GelTip sensor in facilitating dynamic manipulation tasks with its all-round tactile sensing capability. The sensor models and further information about the GelTip sensor can be found at http://danfergo.github.io/geltip.

13.7ROAug 10, 2020

Spatio-temporal Attention Model for Tactile Texture Recognition

Guanqun Cao, Yi Zhou, Danushka Bollegala et al.

Recently, tactile sensing has attracted great interest in robotics, especially for facilitating exploration of unstructured environments and effective manipulation. A detailed understanding of the surface textures via tactile sensing is essential for many of these tasks. Previous works on texture recognition using camera based tactile sensors have been limited to treating all regions in one tactile image or all samples in one tactile sequence equally, which includes much irrelevant or redundant information. In this paper, we propose a novel Spatio-Temporal Attention Model (STAM) for tactile texture recognition, which is the very first of its kind to our best knowledge. The proposed STAM pays attention to both spatial focus of each single tactile texture and the temporal correlation of a tactile sequence. In the experiments to discriminate 100 different fabric textures, the spatially and temporally selective attention has resulted in a significant improvement of the recognition accuracy, by up to 18.8%, compared to the non-attention based models. Specifically, after introducing noisy data that is collected before the contact happens, our proposed STAM can learn the salient features efficiently and the accuracy can increase by 15.23% on average compared with the CNN based baseline approach. The improved tactile texture perception can be applied to facilitate robot tasks like grasping and manipulation.

8.3RODec 2, 2019

Surface Following using Deep Reinforcement Learning and a GelSightTactile Sensor

Chen Lu, Jing Wang, Shan Luo

Tactile sensors can provide detailed contact in-formation that can facilitate robots to perform dexterous, in-hand manipulation tasks. One of the primitive but important tasks is surface following that is a key feature for robots while exploring unknown environments or workspace of inaccurate modeling. In this paper, we propose a novel end-to-end learning strategy, by directly mapping the raw tactile data acquired from a GelSight tactile sensor to the motion of the robot end-effector.Experiments on a KUKA youBot platform equipped with theGelSight sensor show that 80% of the actions generated by a fully trained SFDQN model are proper surface following actions; the autonomous surface following test also indicates that the proposed solution works well on a test surface.

7.3ROJul 17, 2019

Fly Safe: Aerial Swarm Robotics using Force Field Particle Swarm Optimisation

Lauren Parker, James Butterworth, Shan Luo

Particle Swarm Optimisation (PSO) is a powerful optimisation algorithm that can be used to locate global maxima in a search space. Recent interest in swarms of Micro Aerial Vehicles (MAVs) begs the question as to whether PSO can be used as a method to enable real robotic swarms to locate a target goal point. However, the original PSO algorithm does not take into account collisions between particles during search. In this paper we propose a novel algorithm called Force Field Particle Swarm Optimisation (FFPSO) that designates repellent force fields to particles such that these fields provide an additional velocity component into the original PSO equations. We compare the performance of FFPSO with PSO and show that it has the ability to reduce the number of particle collisions during search to 0 whilst also being able to locate a target of interest in a similar amount of time. The scalability of the algorithm is also demonstrated via a set of experiments that considers how the number of crashes and the time taken to find the goal varies according to swarm size. Finally, we demonstrate the algorithms applicability on a swarm of real MAVs.

19.0LGApr 24, 2019Code

Neural Logic Reinforcement Learning

Zhengyao Jiang, Shan Luo

Deep reinforcement learning (DRL) has achieved significant breakthroughs in various tasks. However, most DRL algorithms suffer a problem of generalizing the learned policy which makes the learning performance largely affected even by minor modifications of the training environment. Except that, the use of deep neural networks makes the learned policies hard to be interpretable. To address these two challenges, we propose a novel algorithm named Neural Logic Reinforcement Learning (NLRL) to represent the policies in reinforcement learning by first-order logic. NLRL is based on policy gradient methods and differentiable inductive logic programming that have demonstrated significant advantages in terms of interpretability and generalisability in supervised tasks. Extensive experiments conducted on cliff-walking and blocks manipulation tasks demonstrate that NLRL can induce interpretable policies achieving near-optimal performance while demonstrating good generalisability to environments of different initial states and problem sizes.

7.3ROMar 2, 2019

Fully Convolutional One-Shot Object Segmentation for Industrial Robotics

Benjamin Schnieders, Shan Luo, Gregory Palmer et al.

The ability to identify and localize new objects robustly and effectively is vital for robotic grasping and manipulation in warehouses or smart factories. Deep convolutional neural networks (DCNNs) have achieved the state-of-the-art performance on established image datasets for object detection and segmentation. However, applying DCNNs in dynamic industrial scenarios, e.g., warehouses and autonomous production, remains a challenging problem. DCNNs quickly become ineffective when tasked with detecting objects that they have not been trained on. Given that re-training using the latest data is time consuming, DCNNs cannot meet the requirement of the Factory of the Future (FoF) regarding rapid development and production cycles. To address this problem, we propose a novel one-shot object segmentation framework, using a fully convolutional Siamese network architecture, to detect previously unknown objects based on a single prototype image. We turn to multi-task learning to reduce training time and improve classification accuracy. Furthermore, we introduce a novel approach to automatically cluster the learnt feature space representation in a weakly supervised manner. We test the proposed framework on the RoboCup@Work dataset, simulating requirements for the FoF. Results show that the trained network on average identifies 73% of previously unseen objects correctly from a single example image. Correctly identified objects are estimated to have a 87.53% successful pick-up rate. Finally, multi-task learning lowers the convergence time by up to 33%, and increases accuracy by 2.99%.

12.7ROJun 19, 2018

iCLAP: Shape Recognition by Combining Proprioception and Touch Sensing

Shan Luo, Wenxuan Mou, Kaspar Althoefer et al.

For humans, both the proprioception and touch sensing are highly utilized when performing haptic perception. However, most approaches in robotics use only either proprioceptive data or touch data in haptic object recognition. In this paper, we present a novel method named Iterative Closest Labeled Point (iCLAP) to link the kinesthetic cues and tactile patterns fundamentally and also introduce its extensions to recognize object shapes. In the training phase, the iCLAP first clusters the features of tactile readings into a codebook and assigns these features with distinct label numbers. A 4D point cloud of the object is then formed by taking the label numbers of the tactile features as an additional dimension to the 3D sensor positions; hence, the two sensing modalities are merged to achieve a synthesized perception of the touched object. Furthermore, we developed and validated hybrid fusion strategies, product based and weighted sum based, to combine decisions obtained from iCLAP and single sensing modalities. Extensive experimentation demonstrates a dramatic improvement of object recognition using the proposed methods and it shows great potential to enhance robot perception ability.

21.0ROFeb 21, 2018

ViTac: Feature Sharing between Vision and Tactile Sensing for Cloth Texture Recognition

Shan Luo, Wenzhen Yuan, Edward Adelson et al.

Vision and touch are two of the important sensing modalities for humans and they offer complementary information for sensing the environment. Robots could also benefit from such multi-modal sensing ability. In this paper, addressing for the first time (to the best of our knowledge) texture recognition from tactile images and vision, we propose a new fusion method named Deep Maximum Covariance Analysis (DMCA) to learn a joint latent space for sharing features through vision and tactile sensing. The features of camera images and tactile data acquired from a GelSight sensor are learned by deep neural networks. But the learned features are of a high dimensionality and are redundant due to the differences between the two sensing modalities, which deteriorates the perception performance. To address this, the learned features are paired using maximum covariance analysis. Results of the algorithm on a newly collected dataset of paired visual and tactile data relating to cloth textures show that a good recognition performance of greater than 90\% can be achieved by using the proposed DMCA framework. In addition, we find that the perception performance of either vision or tactile sensing can be improved by employing the shared representation space, compared to learning from unimodal data.

29.5RONov 10, 2017

Robotic Tactile Perception of Object Properties: A Review

Shan Luo, Joao Bimbo, Ravinder Dahiya et al.

Touch sensing can help robots understand their sur- rounding environment, and in particular the objects they interact with. To this end, roboticists have, in the last few decades, developed several tactile sensing solutions, extensively reported in the literature. Research into interpreting the conveyed tactile information has also started to attract increasing attention in recent years. However, a comprehensive study on this topic is yet to be reported. In an effort to collect and summarize the major scientific achievements in the area, this survey extensively reviews current trends in robot tactile perception of object properties. Available tactile sensing technologies are briefly presented before an extensive review on tactile recognition of object properties. The object properties that are targeted by this review are shape, surface material and object pose. The role of touch sensing in combination with other sensing sources is also discussed. In this review, open issues are identified and future directions for applying tactile sensing in different tasks are suggested.

17.8ROAug 15, 2017

Iterative Closest Labeled Point for Tactile Object Shape Recognition

Shan Luo, Wenxuan Mou, Kaspar Althoefer et al.

Tactile data and kinesthetic cues are two important sensing sources in robot object recognition and are complementary to each other. In this paper, we propose a novel algorithm named Iterative Closest Labeled Point (iCLAP) to recognize objects using both tactile and kinesthetic information.The iCLAP first assigns different local tactile features with distinct label numbers. The label numbers of the tactile features together with their associated 3D positions form a 4D point cloud of the object. In this manner, the two sensing modalities are merged to form a synthesized perception of the touched object. To recognize an object, the partial 4D point cloud obtained from a number of touches iteratively matches with all the reference cloud models to identify the best fit. An extensive evaluation study with 20 real objects shows that our proposed iCLAP approach outperforms those using either of the separate sensing modalities, with a substantial recognition rate improvement of up to 18%.

2.4CVAug 15, 2017

Knock-Knock: Acoustic Object Recognition by using Stacked Denoising Autoencoders

Shan Luo, Leqi Zhu, Kaspar Althoefer et al.

This paper presents a successful application of deep learning for object recognition based on acoustic data. The shortcomings of previously employed approaches where handcrafted features describing the acoustic data are being used, include limiting the capability of the found representation to be widely applicable and facing the risk of capturing only insignificant characteristics for a task. In contrast, there is no need to define the feature representation format when using multilayer/deep learning architecture methods: features can be learned from raw sensor data without defining discriminative characteristics a-priori. In this paper, stacked denoising autoencoders are applied to train a deep learning model. Knocking each object in our test set 120 times with a marker pen to obtain the auditory data, thirty different objects were successfully classified in our experiment and each object was knocked 120 times by a marker pen to obtain the auditory data. By employing the proposed deep learning framework, a high accuracy of 91.50% was achieved. A traditional method using handcrafted features with a shallow classifier was taken as a benchmark and the attained recognition rate was only 58.22%. Interestingly, a recognition rate of 82.00% was achieved when using a shallow classifier with raw acoustic data as input. In addition, we could show that the time taken to classify one object using deep learning was far less (by a factor of more than 6) than utilizing the traditional method. It was also explored how different model parameters in our deep architecture affect the recognition performance.