Zheng Dang

CV
h-index10
15papers
310citations
Novelty62%
AI Score33

15 Papers

CVApr 4, 2023
Robust Outlier Rejection for 3D Registration with Variational Bayes

Haobo Jiang, Zheng Dang, Zhen Wei et al.

Learning-based outlier (mismatched correspondence) rejection for robust 3D registration generally formulates the outlier removal as an inlier/outlier classification problem. The core for this to be successful is to learn the discriminative inlier/outlier feature representations. In this paper, we develop a novel variational non-local network-based outlier rejection framework for robust alignment. By reformulating the non-local feature learning with variational Bayesian inference, the Bayesian-driven long-range dependencies can be modeled to aggregate discriminative geometric context information for inlier/outlier distinction. Specifically, to achieve such Bayesian-driven contextual dependencies, each query/key/value component in our non-local network predicts a prior feature distribution and a posterior one. Embedded with the inlier/outlier label, the posterior feature distribution is label-dependent and discriminative. Thus, pushing the prior to be close to the discriminative posterior in the training step enables the features sampled from this prior at test time to model high-quality long-range dependencies. Notably, to achieve effective posterior feature guidance, a specific probabilistic graphical model is designed over our non-local model, which lets us derive a variational low bound as our optimization objective for model training. Finally, we propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation. Extensive experiments on 3DMatch, 3DLoMatch, and KITTI datasets verify the effectiveness of our method.

CVMar 29, 2022
MatchNorm: Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World

Zheng Dang, Lizhou Wang, Yu Guo et al.

In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. While recent learning-based approaches to addressing this task have shown great success on synthetic datasets, we have observed them to fail in the presence of real-world data. We thus analyze the causes of these failures, which we trace back to the difference between the feature distributions of the source and target point clouds, and the sensitivity of the widely-used SVD-based loss function to the range of rotation between the two point clouds. We address the first challenge by introducing a new normalization strategy, Match Normalization, and the second via the use of a loss function based on the negative log likelihood of point correspondences. Our two contributions are general and can be applied to many existing learning-based 3D object registration frameworks, which we illustrate by implementing them in two of them, DCP and IDAM. Our experiments on the real-scene TUD-L, LINEMOD and Occluded-LINEMOD datasets evidence the benefits of our strategies. They allow for the first time learning-based 3D object registration methods to achieve meaningful results on real-world data. We therefore expect them to be key to the future development of point cloud registration methods.

CVOct 26, 2023
SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation

Haobo Jiang, Mathieu Salzmann, Zheng Dang et al.

In this paper, we introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios. Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud to obtain a precise alignment with the model point cloud. Training our framework involves two operations: An SE(3) diffusion process and an SE(3) reverse process. The SE(3) diffusion process gradually perturbs the optimal rigid transformation of a pair of point clouds by continuously injecting noise (perturbation transformation). By contrast, the SE(3) reverse process focuses on learning a denoising network that refines the noisy transformation step-by-step, bringing it closer to the optimal transformation for accurate pose estimation. Unlike standard diffusion models used in linear Euclidean spaces, our diffusion model operates on the SE(3) manifold. This requires exploiting the linear Lie algebra $\mathfrak{se}(3)$ associated with SE(3) to constrain the transformation transitions during the diffusion and reverse processes. Additionally, to effectively train our denoising network, we derive a registration-specific variational lower bound as the optimization objective for model learning. Furthermore, we show that our denoising network can be constructed with a surrogate registration model, making our approach applicable to different deep registration networks. Extensive experiments demonstrate that our diffusion registration framework presents outstanding pose estimation performance on the real-world TUD-L, LINEMOD, and Occluded-LINEMOD datasets.

CVSep 20, 2023
AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration

Zheng Dang, Mathieu Salzmann

In the current deep learning paradigm, the amount and quality of training data are as critical as the network architecture and its training details. However, collecting, processing, and annotating real data at scale is difficult, expensive, and time-consuming, particularly for tasks such as 3D object registration. While synthetic datasets can be created, they require expertise to design and include a limited number of categories. In this paper, we introduce a new approach called AutoSynth, which automatically generates 3D training data for point cloud registration. Specifically, AutoSynth automatically curates an optimal dataset by exploring a search space encompassing millions of potential datasets with diverse 3D shapes at a low cost.To achieve this, we generate synthetic 3D datasets by assembling shape primitives, and develop a meta-learning strategy to search for the best training data for 3D registration on real point clouds. For this search to remain tractable, we replace the point cloud registration network with a much smaller surrogate network, leading to a $4056.43$ times speedup. We demonstrate the generality of our approach by implementing it with two different point cloud registration networks, BPNet and IDAM. Our results on TUD-L, LINEMOD and Occluded-LINEMOD evidence that a neural network trained on our searched dataset yields consistently better performance than the same one trained on the widely used ModelNet40 dataset.

CVMar 20, 2024Code
DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects

Chen Zhao, Tong Zhang, Zheng Dang et al.

Determining the relative pose of a previously unseen object between two images is pivotal to the success of generalizable object pose estimation. Existing approaches typically predict 3D translation utilizing the ground-truth object bounding box and approximate 3D rotation with a large number of discrete hypotheses. This strategy makes unrealistic assumptions about the availability of ground truth and incurs a computationally expensive process of scoring each hypothesis at test time. By contrast, we rethink the problem of relative pose estimation for unseen objects by presenting a Deep Voxel Matching Network (DVMNet++). Our method computes the relative object pose in a single pass, eliminating the need for ground-truth object bounding boxes and rotation hypotheses. We achieve open-set object detection by leveraging image feature embedding and natural language understanding as reference. The detection result is then employed to approximate the translation parameters and crop the object from the query image. For rotation estimation, we map the two RGB images, i.e., reference and cropped query, to their respective voxelized 3D representations. The resulting voxels are passed through a rotation estimation module, which aligns the voxels and computes the rotation in an end-to-end fashion by solving a least-squares problem. To enhance robustness, we introduce a weighted closest voxel algorithm capable of mitigating the impact of noisy voxels. We conduct extensive experiments on the CO3D, Objaverse, LINEMOD, and LINEMOD-O datasets, demonstrating that our approach delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods. Our code is released at https://github.com/sailor-z/DVMNet/.

CVDec 5, 2023
Adaptive Multi-step Refinement Network for Robust Point Cloud Registration

Zhi Chen, Yufan Ren, Tong Zhang et al.

Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds of the same scene. Despite significant progress with learning-based approaches, existing methods still face challenges when the overlapping region between the two point clouds is small. In this paper, we propose an adaptive multi-step refinement network that refines the registration quality at each step by leveraging the information from the preceding step. To achieve this, we introduce a training procedure and a refinement network. Firstly, to adapt the network to the current step, we utilize a generalized one-way attention mechanism, which prioritizes the last step's estimated overlapping region, and we condition the network on step indices. Secondly, instead of training the network to map either random transformations or a fixed pre-trained model's estimations to the ground truth, we train it on transformations with varying registration qualities, ranging from accurate to inaccurate, thereby enhancing the network's adaptiveness and robustness. Despite its conceptual simplicity, our method achieves state-of-the-art performance on both the 3DMatch/3DLoMatch and KITTI benchmarks. Notably, on 3DLoMatch, our method reaches 80.4% recall rate, with an absolute improvement of 1.2%.

CVJun 13, 2024
OpenMaterial: A Large-scale Dataset of Complex Materials for 3D Reconstruction

Zheng Dang, Jialu Huang, Fei Wang et al.

Recent advances in deep learning, such as neural radiance fields and implicit neural representations, have significantly advanced 3D reconstruction. However, accurately reconstructing objects with complex optical properties, such as metals, glass, and plastics, remains challenging due to the breakdown of multi-view color consistency in the presence of specular reflections, refractions, and transparency. This limitation is further exacerbated by the lack of benchmark datasets that explicitly model material-dependent light transport. To address this, we introduce OpenMaterial, a large-scale semi-synthetic dataset for benchmarking material-aware 3D reconstruction. It comprises 1,001 objects spanning 295 distinct materials, including conductors, dielectrics, plastics, and their roughened variants, captured under 714 diverse lighting conditions. By integrating lab-measured Index of Refraction (IOR) spectra, OpenMaterial enables the generation of high-fidelity multi-view images that accurately simulate complex light-matter interactions. It provides multi-view images, 3D shape models, camera poses, depth maps, and object masks, establishing the first extensive benchmark for evaluating 3D reconstruction on challenging materials. We evaluate 11 state-of-the-art methods for 3D reconstruction and novel view synthesis, conducting ablation studies to assess the impact of material type, shape complexity, and illumination on reconstruction performance. Our results indicate that OpenMaterial provides a strong and fair basis for developing more robust, physically-informed 3D reconstruction techniques to better handle real-world optical complexities.

CVNov 19, 2021
What Stops Learning-based 3D Registration from Working in the Real World?

Zheng Dang, Lizhou Wang, Junning Qiu et al.

Much progress has been made on the task of learning-based 3D point cloud registration, with existing methods yielding outstanding results on standard benchmarks, such as ModelNet40, even in the partial-to-partial matching scenario. Unfortunately, these methods still struggle in the presence of real data. In this work, we identify the sources of these failures, analyze the reasons behind them, and propose solutions to tackle them. We summarise our findings into a set of guidelines and demonstrate their effectiveness by applying them to different baseline methods, DCP and IDAM. In short, our guidelines improve both their training convergence and testing accuracy. Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data. Despite being trained only on synthetic data, our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.

CVApr 8, 2021
Robust Differentiable SVD

Wei Wang, Zheng Dang, Yinlin Hu et al.

Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.

CVNov 23, 2020
3D Registration for Self-Occluded Objects in Context

Zheng Dang, Fei Wang, Mathieu Salzmann

While much progress has been made on the task of 3D point cloud registration, there still exists no learning-based method able to estimate the 6D pose of an object observed by a 2.5D sensor in a scene. The challenges of this scenario include the fact that most measurements are outliers depicting the object's surrounding context, and the mismatch between the complete 3D object model and its self-occluded observations. We introduce the first deep learning framework capable of effectively handling this scenario. Our method consists of an instance segmentation module followed by a pose estimation one. It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure. We further develop an on-the-fly rendering-based training strategy that is both time- and memory-efficient. Our experiments evidence the superiority of our approach over the state-of-the-art traditional and learning-based 3D registration methods.

CVNov 17, 2020
A Method to Generate High Precision Mesh Model and RGB-D Datasetfor 6D Pose Estimation Task

Minglei Lu, Yu Guo, Fei Wang et al.

Recently, 3D version has been improved greatly due to the development of deep neural networks. A high quality dataset is important to the deep learning method. Existing datasets for 3D vision has been constructed, such as Bigbird and YCB. However, the depth sensors used to make these datasets are out of date, which made the resolution and accuracy of the datasets cannot full fill the higher standards of demand. Although the equipment and technology got better, but no one was trying to collect new and better dataset. Here we are trying to fill that gap. To this end, we propose a new method for object reconstruction, which takes into account the speed, accuracy and robustness. Our method could be used to produce large dataset with better and more accurate annotation. More importantly, our data is more close to the rendering data, which shrinking the gap between the real data and synthetic data further.

CVJun 8, 2020
Learning 3D-3D Correspondences for One-shot Partial-to-partial Registration

Zheng Dang, Fei Wang, Mathieu Salzmann

While 3D-3D registration is traditionally tacked by optimization-based methods, recent work has shown that learning-based techniques could achieve faster and more robust results. In this context, however, only PRNet can handle the partial-to-partial registration scenario. Unfortunately, this is achieved at the cost of relying on an iterative procedure, with a complex network architecture. Here, we show that learning-based partial-to-partial registration can be achieved in a one-shot manner, jointly reducing network complexity and increasing registration accuracy. To this end, we propose an Optimal Transport layer able to account for occluded points thanks to the use of outlier bins. The resulting OPRNet framework outperforms the state of the art on standard benchmarks, demonstrating better robustness and generalization ability than existing techniques.

CVApr 15, 2020
Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems

Zheng Dang, Kwang Moo Yi, Yinlin Hu et al.

Many classical Computer Vision problems, such as essential matrix computation and pose estimation from 3D to 2D correspondences, can be tackled by solving a linear least-square problem, which can be done by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. While theoretically doable, this introduces numerical instability in the optimization process in practice. In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. We demonstrate that our approach is much more robust than explicit differentiation of the eigendecomposition using two general tasks, outlier rejection and denoising, with several practical examples including wide-baseline stereo, the perspective-n-point problem, and ellipse fitting. Empirically, our method has better convergence properties and yields state-of-the-art results.

LGJun 21, 2019
Backpropagation-Friendly Eigendecomposition

Wei Wang, Zheng Dang, Yinlin Hu et al.

Eigendecomposition (ED) is widely used in deep networks. However, the backpropagation of its results tends to be numerically unstable, whether using ED directly or approximating it with the Power Iteration method, particularly when dealing with large matrices. While this can be mitigated by partitioning the data in small and arbitrary groups, doing so has no theoretical basis and makes its impossible to exploit the power of ED to the full. In this paper, we introduce a numerically stable and differentiable approach to leveraging eigenvectors in deep networks. It can handle large matrices without requiring to split them. We demonstrate the better robustness of our approach over standard ED and PI for ZCA whitening, an alternative to batch normalization, and for PCA denoising, which we introduce as a new normalization strategy for deep networks, aiming to further denoise the network's features.

CVMar 21, 2018
Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses

Zheng Dang, Kwang Moo Yi, Yinlin Hu et al.

Many classical Computer Vision problems, such as essential matrix computation and pose estimation from 3D to 2D correspondences, can be solved by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. Unfortunately, while theoretically doable, this introduces numerical instability in the optimization process in practice. In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. We demonstrate on several tasks, including keypoint matching and 3D pose estimation, that our approach is much more robust than explicit differentiation of the eigendecomposition, It has better convergence properties and yields state-of-the-art results on both tasks.