Xiaotian Wu

CV
h-index10
6papers
9citations
Novelty46%
AI Score43

6 Papers

CVDec 19, 2024Code
TDCNet: Transparent Objects Depth Completion with CNN-Transformer Dual-Branch Parallel Network

Xianghui Fan, Chao Ye, Anping Deng et al.

The sensing and manipulation of transparent objects present a critical challenge in industrial and laboratory robotics. Conventional sensors face challenges in obtaining the full depth of transparent objects due to the refraction and reflection of light on their surfaces and their lack of visible texture. Previous research has attempted to obtain complete depth maps of transparent objects from RGB and damaged depth maps (collected by depth sensor) using deep learning models. However, existing methods fail to fully utilize the original depth map, resulting in limited accuracy for deep completion. To solve this problem, we propose TDCNet, a novel dual-branch CNN-Transformer parallel network for transparent object depth completion. The proposed framework consists of two different branches: one extracts features from partial depth maps, while the other processes RGB-D images. Experimental results demonstrate that our model achieves state-of-the-art performance across multiple public datasets. Our code and the pre-trained model are publicly available at https://github.com/XianghuiFan/TDCNet.

CVMay 9
Cross-Modal RGB-D Fusion Transformer for 6D Pose Estimation of Non-Cooperative Spacecraft with Stereo-Derived Depth

Yongliang Zhen, Bo LÜ, Hang Yang et al.

On-orbit servicing and active debris removal involving non-cooperative spacecraft require reliable pose estimation to supply accurate position and orientation data for autonomous visual navigation. Learning-based monocular methods have seen widespread adoption in spacecraft pose estimation, yet they suffer from an intrinsic depth ambiguity problem and tend to fail under the harsh illumination conditions routinely encountered in orbit. Active depth sensors could in principle address the geometric ambiguity, but their power and mass requirements make them poorly suited to most spacecraft platforms. This work addresses these issues through a passive stereo vision framework for six-degree-of-freedom (6-DOF) pose estimation of non-cooperative spacecraft. A binocular stereo matching network called TSCA-Stereo is developed to cope with weak-texture surfaces, specular highlights, and severe lighting variations typical of space imagery. A cross-modal fusion Transformer is introduced to combine RGB appearance information with stereo depth features in an adaptive manner, supporting reliable pose recovery. A synthetic binocular multimodal dataset is also built for the experiments, covering stereo disparity maps and 6-DOF pose annotations across a range of illumination scenarios, attitude configurations, and noise levels. Experimental results show that TSCA-Stereo outperforms the baseline across every evaluated metric on this space-specific dataset. The full pose estimation pipeline achieves a mean translation error of 0.0419 m and a mean orientation error of 0.8632° under varied imaging conditions, confirming that the passive stereo approach is both effective and resilient when operating under the demanding visual conditions of the space environment.

CHEM-PHMay 7
Development of embedded target detection system based on FPGA and YOLOv3-Tiny

Zihan Jiang, Fanghao Liu, Huawei Wang et al.

Computational complexity and storage requirements are crucial factors influencing the performance and efficiency of convolutional neural networks (CNNs) in resource-constrained environments. This paper presents a high-performance embedded target detection system based on FPGA and YOLOv3-Tiny, specifically designed for embedded artificial intelligence applications. By integrating lightweight CNN optimization techniques with hardware accelerator design, significant improvements are made in both computational efficiency and resource utilization. Key optimizations, including low-bit quantization, batch normalization fusion, and table lookup mapping, reduce model parameters and computational complexity. Additionally, an FPGA hardware accelerator with a pipelined architecture is developed to enhance the efficiency of convolution operations while minimizing off-chip data transmission through modular design and on-chip cache optimization. On the ZYNQ-XC7Z035 platform, the system achieves an inference latency of 0.211 seconds, outperforming comparable designs by 75.58% in speed. The system achieves an power efficiency of 10.11 GOPS/W, surpassing comparable designs by at least 29.45%. Furthermore, hardware resource utilization is reduced by up to 51.94% compared to similar systems. This study offers innovative design methodologies and practical application examples for the efficient deployment of deep learning models on embedded platforms.

LGMay 11, 2024
A De-singularity Subgradient Approach for the Extended Weber Location Problem

Zhao-Rong Lai, Xiaotian Wu, Liangda Fang et al.

The extended Weber location problem is a classical optimization problem that has inspired some new works in several machine learning scenarios recently. However, most existing algorithms may get stuck due to the singularity at the data points when the power of the cost function $1\leqslant q<2$, such as the widely-used iterative Weiszfeld approach. In this paper, we establish a de-singularity subgradient approach for this problem. We also provide a complete proof of convergence which has fixed some incomplete statements of the proofs for some previous Weiszfeld algorithms. Moreover, we deduce a new theoretical result of superlinear convergence for the iteration sequence in a special case where the minimum point is a singular point. We conduct extensive experiments in a real-world machine learning scenario to show that the proposed approach solves the singularity problem, produces the same results as in the non-singularity cases, and shows a reasonable rate of linear convergence. The results also indicate that the $q$-th power case ($1<q<2$) is more advantageous than the $1$-st power case and the $2$-nd power case in some situations. Hence the de-singularity subgradient approach is beneficial to advancing both theory and practice for the extended Weber location problem.

OCDec 20, 2024
De-singularity Subgradient for the $q$-th-Powered $\ell_p$-Norm Weber Location Problem

Zhao-Rong Lai, Xiaotian Wu, Liangda Fang et al.

The Weber location problem is widely used in several artificial intelligence scenarios. However, the gradient of the objective does not exist at a considerable set of singular points. Recently, a de-singularity subgradient method has been proposed to fix this problem, but it can only handle the $q$-th-powered $\ell_2$-norm case ($1\leqslant q<2$), which has only finite singular points. In this paper, we further establish the de-singularity subgradient for the $q$-th-powered $\ell_p$-norm case with $1\leqslant q\leqslant p$ and $1\leqslant p<2$, which includes all the rest unsolved situations in this problem. This is a challenging task because the singular set is a continuum. The geometry of the objective function is also complicated so that the characterizations of the subgradients, minimum and descent direction are very difficult. We develop a $q$-th-powered $\ell_p$-norm Weiszfeld Algorithm without Singularity ($q$P$p$NWAWS) for this problem, which ensures convergence and the descent property of the objective function. Extensive experiments on six real-world data sets demonstrate that $q$P$p$NWAWS successfully solves the singularity problem and achieves a linear computational convergence rate in practical scenarios.

CVAug 12, 2020
Select Good Regions for Deblurring based on Convolutional Neural Networks

Hang Yang, Xiaotian Wu, Xinglong Sun

The goal of blind image deblurring is to recover sharp image from one input blurred image with an unknown blur kernel. Most of image deblurring approaches focus on developing image priors, however, there is not enough attention to the influence of image details and structures on the blur kernel estimation. What is the useful image structure and how to choose a good deblurring region? In this work, we propose a deep neural network model method for selecting good regions to estimate blur kernel. First we construct image patches with labels and train a deep neural networks, then the learned model is applied to determine which region of the image is most suitable to deblur. Experimental results illustrate that the proposed approach is effective, and could be able to select good regions for image deblurring.