CVJul 4, 2022
Open-world Semantic Segmentation for LIDAR Point CloudsJun Cen, Peng Yun, Shiwei Zhang et al. · gatech
Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e.g., autonomous driving, since it is closed-set and static. The closed-set assumption makes the network only able to output labels of trained classes, even for objects never seen before, while a static network cannot update its knowledge base according to what it has seen. Therefore, in this work, we propose the open-world semantic segmentation task for LIDAR point clouds, which aims to 1) identify both old and novel classes using open-set semantic segmentation, and 2) gradually incorporate novel objects into the existing knowledge base using incremental learning without forgetting old classes. For this purpose, we propose a REdundAncy cLassifier (REAL) framework to provide a general architecture for both the open-set semantic segmentation and incremental learning problems. The experimental results show that REAL can simultaneously achieves state-of-the-art performance in the open-set semantic segmentation task on the SemanticKITTI and nuScenes datasets, and alleviate the catastrophic forgetting problem with a large margin during incremental learning.
29.8ROMay 9
Safe and Real-Time Consistent Planning for Autonomous Vehicles in Partially Observed Environments via Parallel Consensus OptimizationLei Zheng, Rui Yang, Minzhe Zheng et al.
Ensuring safety and driving consistency is a significant challenge for autonomous vehicles operating in partially observed environments. This work introduces a consistent parallel trajectory optimization (CPTO) approach to enable safe and consistent driving in dense obstacle environments with perception uncertainties. Utilizing discrete-time barrier function theory, we develop a consensus safety barrier module that ensures reliable safety coverage within the spatiotemporal trajectory space across potential obstacle configurations. Following this, a bi-convex parallel trajectory optimization problem is derived that facilitates decomposition into a series of low-dimensional quadratic programming problems to accelerate computation. By leveraging the consensus alternating direction method of multipliers (ADMM) for parallel optimization, each generated candidate trajectory corresponds to a possible environment configuration while sharing a common consensus trajectory segment. This ensures driving safety and consistency when executing the consensus trajectory segment for the ego vehicle in real time. We validate our CPTO framework through extensive comparisons with state-of-the-art baselines across multiple driving tasks in partially observable environments. Our results demonstrate improved safety and consistency using both synthetic and real-world traffic datasets.
ROApr 19, 2022
A Thin Format Vision-Based Tactile Sensor with A Micro Lens Array (MLA)Xia Chen, Guanlan Zhang, Michael Yu Wang et al.
Vision-based tactile sensors have been widely studied in the robotics field for high spatial resolution and compatibility with machine learning algorithms. However, the currently employed sensor's imaging system is bulky limiting its further application. Here we present a micro lens array (MLA) based vison system to achieve a low thickness format of the sensor package with high tactile sensing performance. Multiple micromachined micro lens units cover the whole elastic touching layer and provide a stitched clear tactile image, enabling high spatial resolution with a thin thickness of 5 mm. The thermal reflow and soft lithography method ensure the uniform spherical profile and smooth surface of micro lens. Both optical and mechanical characterization demonstrated the sensor's stable imaging and excellent tactile sensing, enabling precise 3D tactile information, such as displacement mapping and force distribution with an ultra compact-thin structure.
51.2ROMay 18
TacSE3: Equivariant SE(3) Motion Estimation from Low-Texture Visuotactile Images for In-Gripper Tracking and CompensationZhongyuan Liao, Junzhe Wang, Qingyang Liu et al.
Robotic in-hand manipulation requires reliable object-motion tracking under frequent visual occlusion, yet low-texture visuotactile images provide few stable correspondences for conventional image- or geometry-matching methods. This paper presents TacSE3, a tactile motion-estimation pipeline that converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses, yielding a physically interpretable signal for in-gripper tracking and compensation. Experiments with paired DM-Tac fingertip sensors show that dual-sensor sensing reduces translation-rotation ambiguity, supports rotation tracking across axes and object geometries, and provides a lightweight compensation signal that improves disturbance tolerance in downstream manipulation tasks without retraining the base policy.
ROMar 13, 2025
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic ManipulationQi Lv, Hao Li, Xiang Deng et al.
Despite the significant success of imitation learning in robotic manipulation, its application to bimanual tasks remains highly challenging. Existing approaches mainly learn a policy to predict a distant next-best end-effector pose (NBP) and then compute the corresponding joint rotation angles for motion using inverse kinematics. However, they suffer from two important issues: (1) rarely considering the physical robotic structure, which may cause self-collisions or interferences, and (2) overlooking the kinematics constraint, which may result in the predicted poses not conforming to the actual limitations of the robot joints. In this paper, we propose Kinematics enhanced Spatial-TemporAl gRaph Diffuser (KStar Diffuser). Specifically, (1) to incorporate the physical robot structure information into action prediction, KStar Diffuser maintains a dynamic spatial-temporal graph according to the physical bimanual joint motions at continuous timesteps. This dynamic graph serves as the robot-structure condition for denoising the actions; (2) to make the NBP learning objective consistent with kinematics, we introduce the differentiable kinematics to provide the reference for optimizing KStar Diffuser. This module regularizes the policy to predict more reliable and kinematics-aware next end-effector poses. Experimental results show that our method effectively leverages the physical structural information and generates kinematics-aware actions in both simulation and real-world
LGJun 8, 2024
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RLQi Lv, Xiang Deng, Gongwei Chen et al.
While the conditional sequence modeling with the transformer architecture has demonstrated its effectiveness in dealing with offline reinforcement learning (RL) tasks, it is struggle to handle out-of-distribution states and actions. Existing work attempts to address this issue by data augmentation with the learned policy or adding extra constraints with the value-based RL algorithm. However, these studies still fail to overcome the following challenges: (1) insufficiently utilizing the historical temporal information among inter-steps, (2) overlooking the local intrastep relationships among return-to-gos (RTGs), states, and actions, (3) overfitting suboptimal trajectories with noisy labels. To address these challenges, we propose Decision Mamba (DM), a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy. DM explicitly models the historical hidden state to extract the temporal information by using the mamba architecture. To capture the relationship among RTG-state-action triplets, a fine-grained SSM module is designed and integrated into the original coarse-grained SSM in mamba, resulting in a novel mamba architecture tailored for offline RL. Finally, to mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization. The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations. Extensive experiments on various tasks show that DM outperforms other baselines substantially.
CVJun 4, 2024
RoomTex: Texturing Compositional Indoor Scenes via Iterative InpaintingQi Wang, Ruijie Lu, Xudong Xu et al.
The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input. Our project page is available at https://qwang666.github.io/RoomTex/.
ROFeb 4, 2022
DelTact: A Vision-based Tactile Sensor Using Dense Color PatternGuanlan Zhang, Yipai Du, Hongyu Yu et al.
Tactile sensing is an essential perception for robots to complete dexterous tasks. As a promising tactile sensing technique, vision-based tactile sensors have been developed to improve robot performance in manipulation and grasping. Here we propose a new design of a vision-based tactile sensor, DelTact. The sensor uses a modular hardware architecture for compactness whilst maintaining a contact measurement of full resolution (798*586) and large area (675mm2). Moreover, it adopts an improved dense random color pattern based on the previous version to achieve high accuracy of contact deformation tracking. In particular, we optimize the color pattern generation process and select the appropriate pattern for coordinating with a dense optical flow algorithm under a real-world experimental sensory setting. The optical flow obtained from the raw image is processed to determine shape and force distribution on the contact surface. We also demonstrate the method to extract contact shape and force distribution from the raw images. Experimental results demonstrate that the sensor is capable of providing tactile measurements with low error and high frequency (40Hz).
CVDec 2, 2021
Open-set 3D Object DetectionJun Cen, Peng Yun, Junhao Cai et al.
3D object detection has been wildly studied in recent years, especially for robot perception systems. However, existing 3D object detection is under a closed-set condition, meaning that the network can only output boxes of trained classes. Unfortunately, this closed-set condition is not robust enough for practical use, as it will identify unknown objects as known by mistake. Therefore, in this paper, we propose an open-set 3D object detector, which aims to (1) identify known objects, like the closed-set detection, and (2) identify unknown objects and give their accurate bounding boxes. Specifically, we divide the open-set 3D object detection problem into two steps: (1) finding out the regions containing the unknown objects with high probability and (2) enclosing the points of these regions with proper bounding boxes. The first step is solved by the finding that unknown objects are often classified as known objects with low confidence, and we show that the Euclidean distance sum based on metric learning is a better confidence score than the naive softmax probability to differentiate unknown objects from known objects. On this basis, unsupervised clustering is used to refine the bounding boxes of unknown objects. The proposed method combining metric learning and unsupervised clustering is called the MLUC network. Our experiments show that our MLUC network achieves state-of-the-art performance and can identify both known and unknown objects as expected.
CVAug 10, 2021
Deep Metric Learning for Open World Semantic SegmentationJun Cen, Peng Yun, Junhao Cai et al.
Classical close-set semantic segmentation networks have limited ability to detect out-of-distribution (OOD) objects, which is important for safety-critical applications such as autonomous driving. Incrementally learning these OOD objects with few annotations is an ideal way to enlarge the knowledge base of the deep learning models. In this paper, we propose an open world semantic segmentation system that includes two modules: (1) an open-set semantic segmentation module to detect both in-distribution and OOD objects. (2) an incremental few-shot learning module to gradually incorporate those OOD objects into its existing knowledge base. This open world semantic segmentation system behaves like a human being, which is able to identify OOD objects and gradually learn them with corresponding supervision. We adopt the Deep Metric Learning Network (DMLNet) with contrastive clustering to implement open-set semantic segmentation. Compared to other open-set semantic segmentation methods, our DMLNet achieves state-of-the-art performance on three challenging open-set semantic segmentation datasets without using additional data or generative models. On this basis, two incremental few-shot learning methods are further proposed to progressively improve the DMLNet with the annotations of OOD objects.
CVAug 5, 2021
MFuseNet: Robust Depth Estimation with Learned Multiscopic FusionWeihao Yuan, Rui Fan, Michael Yu Wang et al.
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of a camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. To obtain training data, we build a synthetic dataset with multiscopic images. The experiments on the real-world Middlebury dataset and real robot demonstration show that our multiscopic vision system outperforms traditional two-frame stereo matching methods in depth estimation. Our code and dataset are available at https://sites.google.com/view/multiscopic.
ROMay 3, 2021
Viko: An Adaptive Gecko Gripper with Vision-based Tactile SensorChohei Pang, Kinwing Mak, Yazhan Zhang et al.
Monitoring the state of contact is essential for robotic devices, especially grippers that implement gecko-inspired adhesives where intimate contact is crucial for a firm attachment. However, due to the lack of deformable sensors, few have demonstrated tactile sensing for gecko grippers. We present Viko, an adaptive gecko gripper that utilizes vision-based tactile sensors to monitor contact state. The sensor provides high-resolution real-time measurements of contact area and shear force. Moreover, the sensor is adaptive, low-cost, and compact. We integrated gecko-inspired adhesives into the sensor surface without impeding its adaptiveness and performance. Using a robotic arm, we evaluate the performance of the gripper by a series of grasping test. The gripper has a maximum payload of 8N even at a low fingertip pitch angle of 30 degrees. We also showcase the gripper's ability to adjust fingertip pose for better contact using sensor feedback. Further, everyday object picking is presented as a demonstration of the gripper's adaptiveness.
CVApr 9, 2021
Stereo Matching by Self-supervision of Multiscopic VisionWeihao Yuan, Yazhan Zhang, Bingkun Wu et al.
Self-supervised learning for depth estimation possesses several advantages over supervised learning. The benefits of no need for ground-truth depth, online fine-tuning, and better generalization with unlimited data attract researchers to seek self-supervised solutions. In this work, we propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions. A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network in learning disparity maps end-to-end without ground-truth depth information. To train this framework, we build a new multiscopic dataset consisting of synthetic images rendered by 3D engines and real images captured by real cameras. After being trained with only the synthetic images, our network can perform well in unseen outdoor scenes. Our experiment shows that our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset and is comparable to supervised methods when generalized to unseen data. Our source code and dataset are available at https://sites.google.com/view/multiscopic.
ROMar 26, 2021
A Tactile Sensing Foot for Single Robot Leg StabilizationGuanlan Zhang, Yipai Du, Yazhan Zhan et al.
Tactile sensing on human feet is crucial for motion control, however, has not been explored in robotic counterparts. This work is dedicated to endowing tactile sensing to legged robot's feet and showing that a single-legged robot can be stabilized with only tactile sensing signals from its foot. We propose a robot leg with a novel vision-based tactile sensing foot system and implement a processing algorithm to extract contact information for feedback control in stabilizing tasks. A pipeline to convert images of the foot skin into high-level contact information using a deep learning framework is presented. The leg was quantitatively evaluated in a stabilization task on a tilting surface to show that the tactile foot was able to estimate both the surface tilting angle and the foot poses. Feasibility and effectiveness of the tactile system were investigated qualitatively in comparison with conventional single-legged robotic systems using inertia measurement units (IMU). Experiments demonstrate the capability of vision-based tactile sensors in assisting legged robots to maintain stability on unknown terrains and the potential for regulating more complex motions for humanoid robots.
CVMar 6, 2021
Learning to Predict Vehicle Trajectories with Model-based PlanningHaoran Song, Di Luan, Wenchao Ding et al.
Predicting the future trajectories of on-road vehicles is critical for autonomous driving. In this paper, we introduce a novel prediction framework called PRIME, which stands for Prediction with Model-based Planning. Unlike recent prediction works that utilize neural networks to model scene context and produce unconstrained trajectories, PRIME is designed to generate accurate and feasibility-guaranteed future trajectory predictions. PRIME guarantees the trajectory feasibility by exploiting a model-based generator to produce future trajectories under explicit constraints and enables accurate multimodal prediction by utilizing a learning-based evaluator to select future trajectories. We conduct experiments on the large-scale Argoverse Motion Forecasting Benchmark, where PRIME outperforms the state-of-the-art methods in prediction accuracy, feasibility, and robustness under imperfect tracking.
ROOct 10, 2020
Origami-based Shape Morphing Fingertip to Enhance Grasping Stability and DexterityZicheng Kan, Yazhan Zhang, Chohei Pang et al.
Adaptation to various scene configurations and object properties, stability and dexterity in robotic grasping manipulation is far from explored. This work presents an origami-based shape morphing fingertip design to actively tackle the grasping stability and dexterity problems. The proposed fingertip utilizes origami as its skeleton providing degrees of freedom at desired positions and motor-driven four-bar-linkages as its transmission components to achieve a compact size of the fingertip. 3 morphing types that are commonly observed and essential in robotic grasping are studied and validated with geometrical modeling. Experiments including grasping an object with convex point contact to pivot or do pinch grasping, grasped object reorientation, and enveloping grasping with concave fingertip surfaces are implemented to demonstrate the advantages of our fingertip compared to conventional parallel grippers. Multi-functionality on enhancing grasping stability and dexterity via active adaptation given different grasped objects and manipulation tasks are justified. Video is available at youtu.be/jJoJ3xnDdVk/.
ROAug 28, 2020
Vacuum Driven Auxetic Switching Structure and Its Application on a Gripper and QuadrupedShuai Liu, Sheeraz Athar, Michael Yu Wang
The properties and applications of auxetics have been widely explored in the past years. Through proper utilization of auxetic structures, designs with unprecedented mechanical and structural behaviors can be produced. Taking advantage of this, we present the development of novel and lowcost 3D structures inspired by a simple auxetic unit. The core part, which we call the body in this paper, is a 3D realization of 2D rotating squares. This body structure was formed by joining four similar structures through softer material at the vertices. A monolithic structure of this kind is accomplished through a custom-built multi-material 3D printer. The model works in a way that, when torque is applied along the face of the rotational squares, they tend to bend at the vertex of the softer material, and due to the connected-ness of the design, a proper opening and closing motion is achieved. To demonstrate the potential of this part as an important component for robots, two applications are presented: a soft gripper and a crawling robot. Vacuum-driven actuators move both the applications. The proposed gripper combines the benefits of two types of grippers whose fingers are placed parallel and equally spaced to each other, in a single design. This gripper is adaptable to the size of the object and can grasp objects with large and small cross-sections alike. A novel bending actuator, which is made of soft material and bends in curvature when vacuumed, provides the grasping nature of the gripper. Crawling robots, in addition to their versatile nature, provide a better interaction with humans. The designed crawling robot employs negative pressure-driven actuators to highlight linear and turning locomotion.
CVAug 3, 2020
Self-supervised Object Tracking with Cycle-consistent Siamese NetworksWeihao Yuan, Michael Yu Wang, Qifeng Chen
Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training. In this work, we exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking. Self-supervision can be performed by taking advantage of the cycle consistency in the forward and backward tracking. To better leverage the end-to-end learning of deep networks, we propose to integrate a Siamese region proposal and mask regression network in our tracking framework so that a fast and more accurate tracker can be learned without the annotation of each frame. The experiments on the VOT dataset for visual object tracking and on the DAVIS dataset for video object segmentation propagation show that our method outperforms prior approaches on both tasks.
ROApr 10, 2020
A Flexible Connector for Soft Modular Robots Based on Micropatterned Intersurface JammingYu Alexander Tse, Shuai Liu, Yang Yang et al.
Soft modular robots enable more flexibility and safer interaction with the changing environment than traditional robots. However, it has remained challenging to create deformable connectors that can be integrated into soft machines. In this work, we propose a flexible connector for soft modular robots based on micropatterned intersurface jamming. The connector is composed of micropatterned dry adhesives made by silicone rubber and a flexible main body with inflatable chambers for active engagement and disengagement. Through connection force tests, we evaluate the characteristics of the connector both in the linear direction and under rotational disruptions. The connector can stably support an average maximum load of 22 N (83 times the connector's body weight) linearly and 10.86 N under planar rotation. The proposed connector demonstrates the potential to create a robust connection between soft modular robots without raising the system's overall stiffness; thus guarantees high flexibility of the robotic system.
CVMar 25, 2020
PiP: Planning-informed Trajectory Prediction for Autonomous DrivingHaoran Song, Wenchao Ding, Yuxuan Chen et al.
It is critical to predict the motion of surrounding vehicles for self-driving planning, especially in a socially compliant and flexible way. However, future prediction is challenging due to the interaction and uncertainty in driving behaviors. We propose planning-informed trajectory prediction (PiP) to tackle the prediction problem in the multi-agent setting. Our approach is differentiated from the traditional manner of prediction, which is only based on historical information and decoupled with planning. By informing the prediction process with the planning of ego vehicle, our method achieves the state-of-the-art performance of multi-agent forecasting on highway datasets. Moreover, our approach enables a novel pipeline which couples the prediction and planning, by conditioning PiP on multiple candidate trajectories of the ego vehicle, which is highly beneficial for autonomous driving in interactive scenarios.
CVJan 22, 2020
Active Perception with A Monocular Camera for Multiscopic VisionWeihao Yuan, Rui Fan, Michael Yu Wang et al.
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation for robotic applications. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system actively controls a robot arm with a mounted camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we combine the cost volumes for stereo matching between the reference image and the surrounding images to form a fused cost volume that is robust to outliers. Experiments on the Middlebury dataset and real robot experiments show that our obtained disparity maps are more accurate than two-frame stereo matching: the average absolute error is reduced by 50.2% in our experiments.
RODec 15, 2019
Multi-Object Rearrangement with Monte Carlo Tree Search:A Case Study on Planar Nonprehensile SortingHaoran Song, Joshua A. Haustein, Weihao Yuan et al.
In this work, we address a planar non-prehensile sorting task. Here, a robot needs to push many densely packed objects belonging to different classes into a configuration where these classes are clearly separated from each other. To achieve this, we propose to employ Monte Carlo tree search equipped with a task-specific heuristic function. We evaluate the algorithm on various simulated and real-world sorting tasks. We observe that the algorithm is capable to reliably sort large numbers of convex and non-convex objects, as well as convex objects in the presence of immovable obstacles.
ROOct 9, 2019
Towards Learning to Detect and Predict Contact Events on Vision-based Tactile SensorsYazhan Zhang, Weihao Yuan, Zicheng Kan et al.
In essence, successful grasp boils down to correct responses to multiple contact events between fingertips and objects. In most scenarios, tactile sensing is adequate to distinguish contact events. Due to the nature of high dimensionality of tactile information, classifying spatiotemporal tactile signals using conventional model-based methods is difficult. In this work, we propose to predict and classify tactile signal using deep learning methods, seeking to enhance the adaptability of the robotic grasp system to external event changes that may lead to grasping failure. We develop a deep learning framework and collect 6650 tactile image sequences with a vision-based tactile sensor, and the neural network is integrated into a contact-event-based robotic grasping system. In grasping experiments, we achieved 52% increase in terms of object lifting success rate with contact detection, significantly higher robustness under unexpected loads with slip prediction compared with open-loop grasps, demonstrating that integration of the proposed framework into robotic grasping system substantially improves picking success rate and capability to withstand external disturbances.
ROJun 22, 2019
Effective Estimation of Contact Force and Torque for Vision-based Tactile Sensor with Helmholtz-Hodge DecompositionYazhan Zhang, Zicheng Kan, Yang Yang et al.
Retrieving rich contact information from robotic tactile sensing has been a challenging, yet significant task for the effective perception of object properties that the robot interacts with. This work is dedicated to developing an algorithm to estimate contact force and torque for vision-based tactile sensors. We first introduce the observation of the contact deformation patterns of hyperelastic materials under ideal single-axial loads in simulation. Then based on the observation, we propose a method of estimating surface forces and torque from the contact deformation vector field with the Helmholtz-Hodge Decomposition (HHD) algorithm. Extensive experiments of calibration and baseline comparison are followed to verify the effectiveness of the proposed method in terms of prediction error and variance. The proposed algorithm is further integrated into a contact force visualization module as well as a closed-loop adaptive grasp force control framework and is shown to be useful in both visualization of contact stability and minimum force grasping task.
ROOct 5, 2018
FingerVision Tactile Sensor Design and Slip Detection Using Convolutional LSTM NetworkYazhan Zhang, Zicheng Kan, Yu Alexander Tse et al.
Tactile sensing is essential to the human perception system, so as to robot. In this paper, we develop a novel optical-based tactile sensor "FingerVision" with effective signal processing algorithms. This sensor is composed of soft skin with embedded marker array bonded to rigid frame, and a web camera with a fisheye lens. While being excited with contact force, the camera tracks the movements of markers and deformation field is obtained. Compared to existing tactile sensors, our sensor features compact footprint, high resolution, and ease of fabrication. Besides, utilizing the deformation field estimation, we propose a slip classification framework based on convolution Long Short Term Memory (convolutional LSTM) networks. The data collection process takes advantage of the human sense of slip, during which human hand holds 12 daily objects, interacts with sensor skin and labels data with a slip or non-slip identity based on human feeling of slip. Our slip classification framework performs high accuracy of 97.62% on the test dataset. It is expected to be capable of enhancing the stability of robot grasping significantly, leading to better contact force control, finer object interaction and more active sensing manipulation.