CVAug 17, 2022
SO(3)-Pose: SO(3)-Equivariance Learning for 6D Object Pose EstimationHaoran Pan, Jun Zhou, Yuanpeng Liu et al.
6D pose estimation of rigid objects from RGB-D images is crucial for object grasping and manipulation in robotics. Although RGB channels and the depth (D) channel are often complementary, providing respectively the appearance and geometry information, it is still non-trivial how to fully benefit from the two cross-modal data. From the simple yet new observation, when an object rotates, its semantic label is invariant to the pose while its keypoint offset direction is variant to the pose. To this end, we present SO(3)-Pose, a new representation learning network to explore SO(3)-equivariant and SO(3)-invariant features from the depth channel for pose estimation. The SO(3)-invariant features facilitate to learn more distinctive representations for segmenting objects with similar appearance from RGB channels. The SO(3)-equivariant features communicate with RGB features to deduce the (missed) geometry for detecting keypoints of an object with the reflective surface from the depth channel. Unlike most of existing pose estimation methods, our SO(3)-Pose not only implements the information communication between the RGB and depth channels, but also naturally absorbs the SO(3)-equivariance geometry knowledge from depth images, leading to better appearance and geometry representation learning. Comprehensive experiments show that our method achieves the state-of-the-art performance on three benchmarks.
GRNov 24, 2025
Inverse Rendering for High-Genus Surface Meshes from Multi-View ImagesXiang Gao, Xinmu Wang, Xiaolong Wu et al.
We present a topology-informed inverse rendering approach for reconstructing high-genus surface meshes from multi-view images. Compared to 3D representations like voxels and point clouds, mesh-based representations are preferred as they enable the application of differential geometry theory and are optimized for modern graphics pipelines. However, existing inverse rendering methods often fail catastrophically on high-genus surfaces, leading to the loss of key topological features, and tend to oversmooth low-genus surfaces, resulting in the loss of surface details. This failure stems from their overreliance on Adam-based optimizers, which can lead to vanishing and exploding gradients. To overcome these challenges, we introduce an adaptive V-cycle remeshing scheme in conjunction with a re-parametrized Adam optimizer to enhance topological and geometric awareness. By periodically coarsening and refining the deforming mesh, our method informs mesh vertices of their current topology and geometry before optimization, mitigating gradient issues while preserving essential topological features. Additionally, we enforce topological consistency by constructing topological primitives with genus numbers that match those of ground truth using Gauss-Bonnet theorem. Experimental results demonstrate that our inverse rendering approach outperforms the current state-of-the-art method, achieving significant improvements in Chamfer Distance and Volume IoU, particularly for high-genus surfaces, while also enhancing surface details for low-genus surfaces.
CVNov 24, 2025
Neural Geometry Image-Based Representations with Optimal Transport (OT)Xiang Gao, Yuanpeng Liu, Xinmu Wang et al.
Neural representations for 3D meshes are emerging as an effective solution for compact storage and efficient processing. Existing methods often rely on neural overfitting, where a coarse mesh is stored and progressively refined through multiple decoder networks. While this can restore high-quality surfaces, it is computationally expensive due to successive decoding passes and the irregular structure of mesh data. In contrast, images have a regular structure that enables powerful super-resolution and restoration frameworks, but applying these advantages to meshes is difficult because their irregular connectivity demands complex encoder-decoder architectures. Our key insight is that a geometry image-based representation transforms irregular meshes into a regular image grid, making efficient image-based neural processing directly applicable. Building on this idea, we introduce our neural geometry image-based representation, which is decoder-free, storage-efficient, and naturally suited for neural processing. It stores a low-resolution geometry-image mipmap of the surface, from which high-quality meshes are restored in a single forward pass. To construct geometry images, we leverage Optimal Transport (OT), which resolves oversampling in flat regions and undersampling in feature-rich regions, and enables continuous levels of detail (LoD) through geometry-image mipmapping. Experimental results demonstrate state-of-the-art storage efficiency and restoration accuracy, measured by compression ratio (CR), Chamfer distance (CD), and Hausdorff distance (HD).
CVMay 31, 2021
Deep learning for prediction of hepatocellular carcinoma recurrence after resection or liver transplantation: a discovery and validation studyZhikun Liu, Yuanpeng Liu, Yuan Hong et al.
This study aimed to develop a classifier of prognosis after resection or liver transplantation (LT) for HCC by directly analysing the ubiquitously available histological images using deep learning based neural networks. Nucleus map set was used to train U-net to capture the nuclear architectural information. Train set included the patients with HCC treated by resection and has a distinct outcome. LT set contained patients with HCC treated by LT. Train set and its nuclear architectural information extracted by U-net were used to train MobileNet V2 based classifier (MobileNetV2_HCC_Class), purpose-built for classifying supersized heterogeneous images. The MobileNetV2_HCC_Class maintained relative higher discriminatory power than the other factors after HCC resection or LT in the independent validation set. Pathological review showed that the tumoral areas most predictive of recurrence were characterized by presence of stroma, high degree of cytological atypia, nuclear hyperchomasia, and a lack of immune infiltration. A clinically useful prognostic classifier was developed using deep learning allied to histological slides. The classifier has been extensively evaluated in independent patient populations with different treatment, and gives consistent excellent results across the classical clinical, biological and pathological features. The classifier assists in refining the prognostic prediction of HCC patients and identifying patients who would benefit from more intensive management.
CVMay 29, 2021
Automatic CT Segmentation from Bounding Box Annotations using Convolutional Neural NetworksYuanpeng Liu, Qinglei Hui, Zhiyi Peng et al.
Accurate segmentation for medical images is important for clinical diagnosis. Existing automatic segmentation methods are mainly based on fully supervised learning and have an extremely high demand for precise annotations, which are very costly and time-consuming to obtain. To address this problem, we proposed an automatic CT segmentation method based on weakly supervised learning, by which one could train an accurate segmentation model only with weak annotations in the form of bounding boxes. The proposed method is composed of two steps: 1) generating pseudo masks with bounding box annotations by k-means clustering, and 2) iteratively training a 3D U-Net convolutional neural network as a segmentation model. Some data pre-processing methods are used to improve performance. The method was validated on four datasets containing three types of organs with a total of 627 CT volumes. For liver, spleen and kidney segmentation, it achieved an accuracy of 95.19%, 92.11%, and 91.45%, respectively. Experimental results demonstrate that our method is accurate, efficient, and suitable for clinical use.
CVSep 15, 2020
3DPVNet: Patch-level 3D Hough Voting Network for 6D Pose EstimationYuanpeng Liu, Jun Zhou, Yuqi Zhang et al.
In this paper, we focus on estimating the 6D pose of objects in point clouds. Although the topic has been widely studied, pose estimation in point clouds remains a challenging problem due to the noise and occlusion. To address the problem, a novel 3DPVNet is presented in this work, which utilizes 3D local patches to vote for the object 6D poses. 3DPVNet is comprised of three modules. In particular, a Patch Unification (\textbf{PU}) module is first introduced to normalize the input patch, and also create a standard local coordinate frame on it to generate a reliable vote. We then devise a Weight-guided Neighboring Feature Fusion (\textbf{WNFF}) module in the network, which fuses the neighboring features to yield a semi-global feature for the center patch. WNFF module mines the neighboring information of a local patch, such that the representation capability to local geometric characteristics is significantly enhanced, making the method robust to a certain level of noise. Moreover, we present a Patch-level Voting (\textbf{PV}) module to regress transformations and generates pose votes. After the aggregation of all votes from patches and a refinement step, the final pose of the object can be obtained. Compared to recent voting-based methods, 3DPVNet is patch-level, and directly carried out on point clouds. Therefore, 3DPVNet achieves less computation than point/pixel-level voting scheme, and has robustness to partial data. Experiments on several datasets demonstrate that 3DPVNet achieves the state-of-the-art performance, and is also robust against noise and occlusions.