CVJul 14, 2023
Quantity-Aware Coarse-to-Fine Correspondence for Image-to-Point Cloud RegistrationGongxin Yao, Yixin Xuan, Yiwei Chen et al.
Image-to-point cloud registration aims to determine the relative camera pose between an RGB image and a reference point cloud, serving as a general solution for locating 3D objects from 2D observations. Matching individual points with pixels can be inherently ambiguous due to modality gaps. To address this challenge, we propose a framework to capture quantity-aware correspondences between local point sets and pixel patches and refine the results at both the point and pixel levels. This framework aligns the high-level semantics of point sets and pixel patches to improve the matching accuracy. On a coarse scale, the set-to-patch correspondence is expected to be influenced by the quantity of 3D points. To achieve this, a novel supervision strategy is proposed to adaptively quantify the degrees of correlation as continuous values. On a finer scale, point-to-pixel correspondences are refined from a smaller search space through a well-designed scheme, which incorporates both resampling and quantity-aware priors. Particularly, a confidence sorting strategy is proposed to proportionally select better correspondences at the final stage. Leveraging the advantages of high-quality correspondences, the problem is successfully resolved using an efficient Perspective-n-Point solver within the framework of random sample consensus (RANSAC). Extensive experiments on the KITTI Odometry and NuScenes datasets demonstrate the superiority of our method over the state-of-the-art methods.
CVAug 5, 2024
CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud RegistrationGongxin Yao, Yixin Xuan, Xinyang Li et al.
Image-to-point cloud registration aims to determine the relative camera pose of an RGB image with respect to a point cloud. It plays an important role in camera localization within pre-built LiDAR maps. Despite the modality gaps, most learning-based methods establish 2D-3D point correspondences in feature space without any feedback mechanism for iterative optimization, resulting in poor accuracy and interpretability. In this paper, we propose to reformulate the registration procedure as an iterative Markov decision process, allowing for incremental adjustments to the camera pose based on each intermediate state. To achieve this, we employ reinforcement learning to develop a cross-modal registration agent (CMR-Agent), and use imitation learning to initialize its registration policy for stability and quick-start of the training. According to the cross-modal observations, we propose a 2D-3D hybrid state representation that fully exploits the fine-grained features of RGB images while reducing the useless neutral states caused by the spatial truncation of camera frustum. Additionally, the overall framework is well-designed to efficiently reuse one-shot cross-modal embeddings, avoiding repetitive and time-consuming feature extraction. Extensive experiments on the KITTI-Odometry and NuScenes datasets demonstrate that CMR-Agent achieves competitive accuracy and efficiency in registration. Once the one-shot embeddings are completed, each iteration only takes a few milliseconds.
CVAug 5, 2024
MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose RetrievalGongxin Yao, Xinyang Li, Yixin Xuan et al.
Image-to-point cloud registration seeks to estimate their relative camera pose, which remains an open question due to the data modality gaps. The recent matching-based methods tend to tackle this by building 2D-3D correspondences. In this paper, we reveal the information loss inherent in these methods and propose a matching-free paradigm, named MaFreeI2P. Our key insight is to actively retrieve the camera pose in SE(3) space by contrasting the geometric features between the point cloud and the query image. To achieve this, we first sample a set of candidate camera poses and construct their cost volume using the cross-modal features. Superior to matching, cost volume can preserve more information and its feature similarity implicitly reflects the confidence level of the sampled poses. Afterwards, we employ a convolutional network to adaptively formulate a similarity assessment function, where the input cost volume is further improved by filtering and pose-based weighting. Finally, we update the camera pose based on the similarity scores, and adopt a heuristic strategy to iteratively shrink the pose sampling space for convergence. Our MaFreeI2P achieves a very competitive registration accuracy and recall on the KITTI-Odometry and Apollo-DaoxiangLake datasets.
CVJun 27, 2024Code
FAGhead: Fully Animate Gaussian Head from Monocular VideosYixin Xuan, Xinyang Li, Gongxin Yao et al.
High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Representation Field (PLRF) with learnable Gaussian point positions to enhance reconstruction performance. Meanwhile, to effectively manage the edges of avatars, we introduced the alpha rendering to supervise the alpha value of each pixel. Extensive experimental results on the open-source datasets and our capturing datasets demonstrate that our approach is able to generate high-fidelity 3D head avatars and fully control the expression and pose of the virtual avatars, which is outperforming than existing works.
CVMay 20, 2024
GGAvatar: Geometric Adjustment of Gaussian Head AvatarXinyang Li, Jiaxin Wang, Yixin Xuan et al.
We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an adaptive density control strategy to model the geometric structure of the target subject with neutral expressions. Geometry Morph Adjuster introduces deformation bases for each Gaussian in global space, creating fine-grained low-dimensional representations of deformation behaviors to address the Linear Blend Skinning formula's limitations effectively. Extensive experiments show that GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.