Zhiyong Su

Semantic Scholar Profile

h-index9

11papers

26citations

Novelty48%

AI Score45

Ranked #67,927 of 205,806 authors (top 33%)#24,013 in CV (top 41%)

11 Papers

CVNov 2, 2022Code

Hypergraph Convolutional Network based Weakly Supervised Point Cloud Semantic Segmentation with Scene-Level Annotations

Zhuheng Lu, Peng Zhang, Yuewei Dai et al.

Point cloud segmentation with scene-level annotations is a promising but challenging task. Currently, the most popular way is to employ the class activation map (CAM) to locate discriminative regions and then generate point-level pseudo labels from scene-level annotations. However, these methods always suffer from the point imbalance among categories, as well as the sparse and incomplete supervision from CAM. In this paper, we propose a novel weighted hypergraph convolutional network-based method, called WHCN, to confront the challenges of learning point-wise labels from scene-level annotations. Firstly, in order to simultaneously overcome the point imbalance among different categories and reduce the model complexity, superpoints of a training point cloud are generated by exploiting the geometrically homogeneous partition. Then, a hypergraph is constructed based on the high-confidence superpoint-level seeds which are converted from scene-level annotations. Secondly, the WHCN takes the hypergraph as input and learns to predict high-precision point-level pseudo labels by label propagation. Besides the backbone network consisting of spectral hypergraph convolution blocks, a hyperedge attention module is learned to adjust the weights of hyperedges in the WHCN. Finally, a segmentation network is trained by these pseudo point cloud labels. We comprehensively conduct experiments on the ScanNet and S3DIS segmentation datasets. Experimental results demonstrate that the proposed WHCN is effective to predict the point labels with scene annotations, and yields state-of-the-art results in the community. The source code is available at http://zhiyongsu.github.io/Project/WHCN.html.

CVNov 2, 2022

AS-PD: An Arbitrary-Size Downsampling Framework for Point Clouds

Peng Zhang, Ruoyin Xie, Jinsheng Sun et al.

Point cloud downsampling is a crucial pre-processing operation to downsample points in order to unify data size and reduce computational cost, to name a few. Recent research on point cloud downsampling has achieved great success which concentrates on learning to sample in a task-aware way. However, existing learnable samplers can not directly perform arbitrary-size downsampling, and assume the input size is fixed. In this paper, we introduce the AS-PD, a novel task-aware sampling framework that directly downsamples point clouds to any smaller size based on a sample-to-refine strategy. Given an input point cloud of arbitrary size, we first perform a task-agnostic pre-sampling on the input point cloud to a specified sample size. Then, we obtain the sampled set by refining the pre-sampled set to make it task-aware, driven by downstream task losses. The refinement is realized by adding each pre-sampled point with a small offset predicted by point-wise multi-layer perceptrons (MLPs). With the density encoding and proper training scheme, the framework can learn to adaptively downsample point clouds of different input sizes to arbitrary sample sizes. We evaluate sampled results for classification and registration tasks, respectively. The proposed AS-PD surpasses the state-of-the-art method in terms of downstream performance. Further experiments also show that our AS-PD exhibits better generality to unseen task models, implying that the proposed sampler is optimized to the task rather than a specified task model.

IVFeb 12Code

UPDA: Unsupervised Progressive Domain Adaptation for No-Reference Point Cloud Quality Assessment

Bingxu Xie, Fang Zhou, Jincan Wu et al.

While no-reference point cloud quality assessment (NR-PCQA) approaches have achieved significant progress over the past decade, their performance often degrades substantially when a distribution gap exists between the training (source domain) and testing (target domain) data. However, to date, limited attention has been paid to transferring NR-PCQA models across domains. To address this challenge, we propose the first unsupervised progressive domain adaptation (UPDA) framework for NR-PCQA, which introduces a two-stage coarse-to-fine alignment paradigm to address domain shifts. At the coarse-grained stage, a discrepancy-aware coarse-grained alignment method is designed to capture relative quality relationships between cross-domain samples through a novel quality-discrepancy-aware hybrid loss, circumventing the challenges of direct absolute feature alignment. At the fine-grained stage, a perception fusion fine-grained alignment approach with symmetric feature fusion is developed to identify domain-invariant features, while a conditional discriminator selectively enhances the transfer of quality-relevant features. Extensive experiments demonstrate that the proposed UPDA effectively enhances the performance of NR-PCQA methods in cross-domain scenarios, validating its practical applicability. The code is available at https://github.com/yokeno1/UPDA-main.

CVNov 2, 2022

Joint Data and Feature Augmentation for Self-Supervised Representation Learning on Point Clouds

Zhuheng Lu, Yuewei Dai, Weiqing Li et al.

To deal with the exhausting annotations, self-supervised representation learning from unlabeled point clouds has drawn much attention, especially centered on augmentation-based contrastive methods. However, specific augmentations hardly produce sufficient transferability to high-level tasks on different datasets. Besides, augmentations on point clouds may also change underlying semantics. To address the issues, we propose a simple but efficient augmentation fusion contrastive learning framework to combine data augmentations in Euclidean space and feature augmentations in feature space. In particular, we propose a data augmentation method based on sampling and graph generation. Meanwhile, we design a data augmentation network to enable a correspondence of representations by maximizing consistency between augmented graph pairs. We further design a feature augmentation network that encourages the model to learn representations invariant to the perturbations using an encoder perturbation. We comprehensively conduct extensive object classification experiments and object part segmentation experiments to validate the transferability of the proposed framework. Experimental results demonstrate that the proposed framework is effective to learn the point cloud representation in a self-supervised manner, and yields state-of-the-art results in the community. The source code is publicly available at: https://zhiyongsu.github.io/Project/AFSRL.html.

23.0CVApr 18

UGD: An Unsupervised Geometric Distance for Evaluating Real-world Noisy Point Cloud Denoising

Zhiyong Su, Jincan Wu, Yonghui Liu et al.

Point cloud denoising is a fundamental and crucial challenge in real-world point cloud applications. Existing quantitative evaluation metrics for point cloud denoising methods are implemented in a supervised manner, which requires both the denoised point cloud and the corresponding ground-truth clean point cloud to compute a representative geometric distance. This requirement is highly problematic in real-world scenarios, where ground-truth clean point clouds are often unavailable. In this paper, we propose a simple yet effective unsupervised geometric distance (UGD) for real-world noisy point cloud denoising, calculated solely from noisy point clouds. The core idea of UGD is to learn a patch-wise prior model from a set of clean point clouds and then employ this prior model as the ground-truth to quantify the degradation by measuring the geometric variations of the denoised point cloud. To this end, we first learn a pristine Gaussian Mixture Model (GMM) with extracted patch-wise quality-aware features from a set of pristine clean point clouds by a patch-wise feature extraction network, which serves as the ground-truth for the quantitative evaluation. Then, the UGD is defined as the weighted sum of distances between each patch of the denoised point cloud and the learned pristine GMM model in the patch space. To train the employed patch-wise feature extraction network, we propose a self-supervised training framework through multi-task learning, which includes pair-wise quality ranking, distortion classification, and distortion distribution prediction. Quantitative experiments with synthetic noise confirm that the proposed UGD achieves comparable performance to supervised full-reference metrics. Moreover, experimental results on real-world data demonstrate that the proposed UGD enables unsupervised evaluation of point cloud denoising methods based exclusively on noisy point clouds.

CVAug 10, 2024

Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering

Fan Zhang, Ziyue Ji, Weiguang Kang et al.

With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm.

CVJul 31, 2024

Fine-grained Metrics for Point Cloud Semantic Segmentation

Zhuheng Lu, Ting Wu, Yuewei Dai et al.

Two forms of imbalances are commonly observed in point cloud semantic segmentation datasets: (1) category imbalances, where certain objects are more prevalent than others; and (2) size imbalances, where certain objects occupy more points than others. Because of this, the majority of categories and large objects are favored in the existing evaluation metrics. This paper suggests fine-grained mIoU and mAcc for a more thorough assessment of point cloud segmentation algorithms in order to address these issues. Richer statistical information is provided for models and datasets by these fine-grained metrics, which also lessen the bias of current semantic segmentation metrics towards large objects. The proposed metrics are used to train and assess various semantic segmentation algorithms on three distinct indoor and outdoor semantic segmentation datasets.

CVMar 11, 2024

Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Peng Zhang, Ting Wu, Jinsheng Sun et al.

Existing interactive point cloud segmentation approaches primarily focus on the object segmentation, which aim to determine which points belong to the object of interest guided by user interactions. This paper concentrates on an unexplored yet meaningful task, i.e., interactive point cloud semantic segmentation, which assigns high-quality semantic labels to all points in a scene with user corrective clicks. Concretely, we presents the first interactive framework for point cloud semantic segmentation, named InterPCSeg, which seamlessly integrates with off-the-shelf semantic segmentation networks without offline re-training, enabling it to run in an on-the-fly manner. To achieve online refinement, we treat user interactions as sparse training examples during the test-time. To address the instability caused by the sparse supervision, we design a stabilization energy to regulate the test-time training process. For objective and reproducible evaluation, we develop an interaction simulation scheme tailored for the interactive point cloud semantic segmentation task. We evaluate our framework on the S3DIS and ScanNet datasets with off-the-shelf segmentation networks, incorporating interactions from both the proposed interaction simulator and real users. Quantitative and qualitative experimental results demonstrate the efficacy of our framework in refining the semantic segmentation results with user interactions. The source code will be publicly available.

CVAug 7, 2025

Open-world Point Cloud Semantic Segmentation: A Human-in-the-loop Framework

Peng Zhang, Songru Yang, Jinsheng Sun et al.

Open-world point cloud semantic segmentation (OW-Seg) aims to predict point labels of both base and novel classes in real-world scenarios. However, existing methods rely on resource-intensive offline incremental learning or densely annotated support data, limiting their practicality. To address these limitations, we propose HOW-Seg, the first human-in-the-loop framework for OW-Seg. Specifically, we construct class prototypes, the fundamental segmentation units, directly on the query data, avoiding the prototype bias caused by intra-class distribution shifts between the support and query data. By leveraging sparse human annotations as guidance, HOW-Seg enables prototype-based segmentation for both base and novel classes. Considering the lack of granularity of initial prototypes, we introduce a hierarchical prototype disambiguation mechanism to refine ambiguous prototypes, which correspond to annotations of different classes. To further enrich contextual awareness, we employ a dense conditional random field (CRF) upon the refined prototypes to optimize their label assignments. Through iterative human feedback, HOW-Seg dynamically improves its predictions, achieving high-quality segmentation for both base and novel classes. Experiments demonstrate that with sparse annotations (e.g., one-novel-class-one-click), HOW-Seg matches or surpasses the state-of-the-art generalized few-shot segmentation (GFS-Seg) method under the 5-shot setting. When using advanced backbones (e.g., Stratified Transformer) and denser annotations (e.g., 10 clicks per sub-scene), HOW-Seg achieves 85.27% mIoU on S3DIS and 66.37% mIoU on ScanNetv2, significantly outperforming alternatives.

CVFeb 17, 2025

No-reference geometry quality assessment for colorless point clouds via list-wise rank learning

Zheng Li, Bingxu Xie, Chao Chu et al.

Geometry quality assessment (GQA) of colorless point clouds is crucial for evaluating the performance of emerging point cloud-based solutions (e.g., watermarking, compression, and 3-Dimensional (3D) reconstruction). Unfortunately, existing objective GQA approaches are traditional full-reference metrics, whereas state-of-the-art learning-based point cloud quality assessment (PCQA) methods target both color and geometry distortions, neither of which are qualified for the no-reference GQA task. In addition, the lack of large-scale GQA datasets with subjective scores, which are always imprecise, biased, and inconsistent, also hinders the development of learning-based GQA metrics. Driven by these limitations, this paper proposes a no-reference geometry-only quality assessment approach based on list-wise rank learning, termed LRL-GQA, which comprises of a geometry quality assessment network (GQANet) and a list-wise rank learning network (LRLNet). The proposed LRL-GQA formulates the no-reference GQA as a list-wise rank problem, with the objective of directly optimizing the entire quality ordering. Specifically, a large dataset containing a variety of geometry-only distortions is constructed first, named LRL dataset, in which each sample is label-free but coupled with quality ranking information. Then, the GQANet is designed to capture intrinsic multi-scale patch-wise geometric features in order to predict a quality index for each point cloud. After that, the LRLNet leverages the LRL dataset and a likelihood loss to train the GQANet and ranks the input list of degraded point clouds according to their distortion levels. In addition, the pre-trained GQANet can be fine-tuned further to obtain absolute quality scores. Experimental results demonstrate the superior performance of the proposed no-reference LRL-GQA method compared with existing full-reference GQA metrics.

CVFeb 17, 2025

The Worse The Better: Content-Aware Viewpoint Generation Network for Projection-related Point Cloud Quality Assessment

Zhiyong Su, Bingxu Xie, Zheng Li et al.

Through experimental studies, however, we observed the instability of final predicted quality scores, which change significantly over different viewpoint settings. Inspired by the "wooden barrel theory", given the default content-independent viewpoints of existing projection-related PCQA approaches, this paper presents a novel content-aware viewpoint generation network (CAVGN) to learn better viewpoints by taking the distribution of geometric and attribute features of degraded point clouds into consideration. Firstly, the proposed CAVGN extracts multi-scale geometric and texture features of the entire input point cloud, respectively. Then, for each default content-independent viewpoint, the extracted geometric and texture features are refined to focus on its corresponding visible part of the input point cloud. Finally, the refined geometric and texture features are concatenated to generate an optimized viewpoint. To train the proposed CAVGN, we present a self-supervised viewpoint ranking network (SSVRN) to select the viewpoint with the worst quality projected image to construct a default-optimized viewpoint dataset, which consists of thousands of paired default viewpoints and corresponding optimized viewpoints. Experimental results show that the projection-related PCQA methods can achieve higher performance using the viewpoints generated by the proposed CAVGN.