Chongkun Xia

RO
h-index13
10papers
170citations
Novelty56%
AI Score45

10 Papers

CVAug 25, 2022Code
Polarimetric Inverse Rendering for Transparent Shapes Reconstruction

Mingqi Shao, Chongkun Xia, Dongxu Duan et al.

In this work, we propose a novel method for the detailed reconstruction of transparent objects by exploiting polarimetric cues. Most of the existing methods usually lack sufficient constraints and suffer from the over-smooth problem. Hence, we introduce polarization information as a complementary cue. We implicitly represent the object's geometry as a neural network, while the polarization render is capable of rendering the object's polarization images from the given shape and illumination configuration. Direct comparison of the rendered polarization images to the real-world captured images will have additional errors due to the transmission in the transparent object. To address this issue, the concept of reflection percentage which represents the proportion of the reflection component is introduced. The reflection percentage is calculated by a ray tracer and then used for weighting the polarization loss. We build a polarization dataset for multi-view transparent shapes reconstruction to verify our method. The experimental results show that our method is capable of recovering detailed shapes and improving the reconstruction quality of transparent objects. Our dataset and code will be publicly available at https://github.com/shaomq2187/TransPIR.

ROJan 8, 2023
Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention

Kai Mo, Chongkun Xia, Xueqian Wang et al.

Sequential multi-step cloth manipulation is a challenging problem in robotic manipulation, requiring a robot to perceive the cloth state and plan a sequence of chained actions leading to the desired state. Most previous works address this problem in a goal-conditioned way, and goal observation must be given for each specific task and cloth configuration, which is not practical and efficient. Thus, we present a novel multi-step cloth manipulation planning framework named Foldformer. Foldformer can complete similar tasks with only a general demonstration and utilize a space-time attention mechanism to capture the instruction information behind this demonstration. We experimentally evaluate Foldsformer on four representative sequential multi-step manipulation tasks and show that Foldsformer significantly outperforms state-of-the-art approaches in simulation. Foldformer can complete multi-step cloth manipulation tasks even when configurations of the cloth (e.g., size and pose) vary from configurations in the general demonstrations. Furthermore, our approach can be transferred from simulation to the real world without additional training or domain randomization. Despite training on rectangular clothes, we also show that our approach can generalize to unseen cloth shapes (T-shirts and shorts). Videos and source code are available at: https://sites.google.com/view/foldsformer.

CVMar 18Code
Transparent Fragments Contour Estimation via Visual-Tactile Fusion for Autonomous Reassembly

Qihao Lin, Borui Chen, Yuping Zhou et al.

The contour estimation of transparent fragments is very important for autonomous reassembly, especially in the fields of precision optical instrument repair, cultural relic restoration, and identification of other precious device broken accidents. Different from general intact transparent objects, the contour estimation of transparent fragments face greater challenges due to strict optical properties, irregular shapes and edges. To address this issue, a general transparent fragments contour estimation framework based on visual-tactile fusion is proposed in this paper. First, we construct the transparent fragment dataset named TransFrag27K, which includes a multiscene synthetic data of broken fragments from multiple types of transparent objects, and a scalable synthetic data generation pipeline. Secondly, we propose a visual grasping position detection network named TransFragNet to identify, locate and segment the sampling grasping position. And, we use a two-finger gripper with Gelsight Mini sensors to obtain reconstructed tactile information of the lateral edge of the fragments. By fusing this tactile information with visual cues, a visual-tactile fusion material classifier is proposed. Inspired by the way humans estimate a fragment's contour combining vision and touch, we introduce a general transparent fragment contour estimation framework based on visual-tactile fusion, demonstrates strong performance in real-world validation. Finally, a multi-dimensional similarity metrics based contour matching and reassembly algorithm is proposed, providing a reproducible benchmark for evaluating visual-tactile contour estimation and fragment reassembly. The experimental results demonstrate the validity of the proposed framework. The dataset and codes are available at https://github.com/Keithllin/Transparent-Fragments-Contour-Estimation.

RONov 30, 2022
Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

Shoujie Li, Haixin Yu, Wenbo Ding et al.

The accurate detection and grasping of transparent objects are challenging but of significance to robots. Here, a visual-tactile fusion framework for transparent object grasping under complex backgrounds and variant light conditions is proposed, including the grasping position detection, tactile calibration, and visual-tactile fusion based classification. First, a multi-scene synthetic grasping dataset generation method with a Gaussian distribution based data annotation is proposed. Besides, a novel grasping network named TGCNN is proposed for grasping position detection, showing good results in both synthetic and real scenes. In tactile calibration, inspired by human grasping, a fully convolutional network based tactile feature extraction method and a central location based adaptive grasping strategy are designed, improving the success rate by 36.7% compared to direct grasping. Furthermore, a visual-tactile fusion method is proposed for transparent objects classification, which improves the classification accuracy by 34%. The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.

ROAug 21, 2024Code
ViIK: Flow-based Vision Inverse Kinematics Solver with Fusing Collision Checking

Qinglong Meng, Chongkun Xia, Xueqian Wang

Inverse Kinematics (IK) is to find the robot's configurations that satisfy the target pose of the end effector. In motion planning, diverse configurations were required in case a feasible trajectory was not found. Meanwhile, collision checking (CC), e.g. Oriented bounding box (OBB), Discrete Oriented Polytope (DOP), and Quickhull \cite{quickhull}, needs to be done for each configuration provided by the IK solver to ensure every goal configuration for motion planning is available. This means the classical IK solver and CC algorithm should be executed repeatedly for every configuration. Thus, the preparation time is long when the required number of goal configurations is large, e.g. motion planning in cluster environments. Moreover, structured maps, which might be difficult to obtain, were required by classical collision-checking algorithms. To sidestep such two issues, we propose a flow-based vision method that can output diverse available configurations by fusing inverse kinematics and collision checking, named Vision Inverse Kinematics solver (ViIK). Moreover, ViIK uses RGB images as the perception of environments. ViIK can output 1000 configurations within 40 ms, and the accuracy is about 3 millimeters and 1.5 degrees. The higher accuracy can be obtained by being refined by the classical IK solver within a few iterations. The self-collision rates can be lower than 2%. The collision-with-env rates can be lower than 10% in most scenes. The code is available at: https://github.com/AdamQLMeng/ViIK.

ROFeb 21, 2023
Deep Reinforcement Learning Based on Local GNN for Goal-conditioned Deformable Object Rearranging

Yuhong Deng, Chongkun Xia, Xueqian Wang et al.

Object rearranging is one of the most common deformable manipulation tasks, where the robot needs to rearrange a deformable object into a goal configuration. Previous studies focus on designing an expert system for each specific task by model-based or data-driven approaches and the application scenarios are therefore limited. Some research has been attempting to design a general framework to obtain more advanced manipulation capabilities for deformable rearranging tasks, with lots of progress achieved in simulation. However, transferring from simulation to reality is difficult due to the limitation of the end-to-end CNN architecture. To address these challenges, we design a local GNN (Graph Neural Network) based learning method, which utilizes two representation graphs to encode keypoints detected from images. Self-attention is applied for graph updating and cross-attention is applied for generating manipulation actions. Extensive experiments have been conducted to demonstrate that our framework is effective in multiple 1-D (rope, rope ring) and 2-D (cloth) rearranging tasks in simulation and can be easily transferred to a real robot by fine-tuning a keypoint detector.

CVApr 13, 2022
Transparent Shape from a Single View Polarization Image

Mingqi Shao, Chongkun Xia, Zhendong Yang et al.

This paper presents a learning-based method for transparent surface estimation from a single view polarization image. Existing shape from polarization(SfP) methods have the difficulty in estimating transparent shape since the inherent transmission interference heavily reduces the reliability of physics-based prior. To address this challenge, we propose the concept of physics-based prior, which is inspired by the characteristic that the transmission component in the polarization image has more noise than reflection. The confidence is used to determine the contribution of the interfered physics-based prior. Then, we build a network(TransSfP) with multi-branch architecture to avoid the destruction of relationships between different hierarchical inputs. To train and test our method, we construct a dataset for transparent shape from polarization with paired polarization images and ground-truth normal maps. Extensive experiments and comparisons demonstrate the superior accuracy of our method.

ROFeb 21, 2023
Graph-Transporter: A Graph-based Learning Method for Goal-Conditioned Deformable Object Rearranging Task

Yuhong Deng, Chongkun Xia, Xueqian Wang et al.

Rearranging deformable objects is a long-standing challenge in robotic manipulation for the high dimensionality of configuration space and the complex dynamics of deformable objects. We present a novel framework, Graph-Transporter, for goal-conditioned deformable object rearranging tasks. To tackle the challenge of complex configuration space and dynamics, we represent the configuration space of a deformable object with a graph structure and the graph features are encoded by a graph convolution network. Our framework adopts an architecture based on Fully Convolutional Network (FCN) to output pixel-wise pick-and-place actions from only visual input. Extensive experiments have been conducted to validate the effectiveness of the graph representation of deformable object configuration. The experimental results also demonstrate that our framework is effective and general in handling goal-conditioned deformable object rearranging tasks.

LGMar 13, 2024Code
PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise

Qinglong Meng, Chongkun Xia, Xueqian Wang

Normalizing flow is a generative modeling approach with efficient sampling. However, Flow-based models suffer two issues: 1) If the target distribution is manifold, due to the unmatch between the dimensions of the latent target distribution and the data distribution, flow-based models might perform badly. 2) Discrete data might make flow-based models collapse into a degenerate mixture of point masses. To sidestep such two issues, we propose PaddingFlow, a novel dequantization method, which improves normalizing flows with padding-dimensional noise. To implement PaddingFlow, only the dimension of normalizing flows needs to be modified. Thus, our method is easy to implement and computationally cheap. Moreover, the padding-dimensional noise is only added to the padding dimension, which means PaddingFlow can dequantize without changing data distributions. Implementing existing dequantization methods needs to change data distributions, which might degrade performance. We validate our method on the main benchmarks of unconditional density estimation, including five tabular datasets and four image datasets for Variational Autoencoder (VAE) models, and the Inverse Kinematics (IK) experiments which are conditional density estimation. The results show that PaddingFlow can perform better in all experiments in this paper, which means PaddingFlow is widely suitable for various tasks. The code is available at: https://github.com/AdamQLMeng/PaddingFlow.

ROJan 18, 2024
PPNet: A Two-Stage Neural Network for End-to-end Path Planning

Qinglong Meng, Chongkun Xia, Xueqian Wang et al.

The classical path planners, such as sampling-based path planners, can provide probabilistic completeness guarantees in the sense that the probability that the planner fails to return a solution if one exists, decays to zero as the number of samples approaches infinity. However, finding a near-optimal feasible solution in a given period is challenging in many applications such as the autonomous vehicle. To achieve an end-to-end near-optimal path planner, we first divide the path planning problem into two subproblems, which are path space segmentation and waypoints generation in the given path's space. We further propose a two-stage neural network named Path Planning Network (PPNet) each stage solves one of the subproblems abovementioned. Moreover, we propose a novel efficient data generation method for path planning named EDaGe-PP. EDaGe-PP can generate data with continuous-curvature paths with analytical expression while satisfying the clearance requirement. The results show the total computation time of generating random 2D path planning data is less than 1/33 and the success rate of PPNet trained by the dataset that is generated by EDaGe-PP is about 2 times compared to other methods. We validate PPNet against state-of-the-art path planning methods. The results show that PPNet can find a near-optimal solution in 15.3ms, which is much shorter than the state-of-the-art path planners.