YangHui Xu

CV
h-index1
3papers
30citations
Novelty50%
AI Score37

3 Papers

CVFeb 9
TIBR4D: Tracing-Guided Iterative Boundary Refinement for Efficient 4D Gaussian Segmentation

He Wu, Xia Yan, Yanghui Xu et al.

Object-level segmentation in dynamic 4D Gaussian scenes remains challenging due to complex motion, occlusions, and ambiguous boundaries. In this paper, we present an efficient learning-free 4D Gaussian segmentation framework that lifts video segmentation masks to 4D spaces, whose core is a two-stage iterative boundary refinement, TIBR4D. The first stage is an Iterative Gaussian Instance Tracing (IGIT) at the temporal segment level. It progressively refines Gaussian-to-instance probabilities through iterative tracing, and extracts corresponding Gaussian point clouds that better handle occlusions and preserve completeness of object structures compared to existing one-shot threshold-based methods. The second stage is a frame-wise Gaussian Rendering Range Control (RCC) via suppressing highly uncertain Gaussians near object boundaries while retaining their core contributions for more accurate boundaries. Furthermore, a temporal segmentation merging strategy is proposed for IGIT to balance identity consistency and dynamic awareness. Longer segments enforce stronger multi-frame constraints for stable identities, while shorter segments allow identity changes to be captured promptly. Experiments on HyperNeRF and Neu3D demonstrate that our method produces accurate object Gaussian point clouds with clearer boundaries and higher efficiency compared to SOTA methods.

CVOct 28, 2024
Transformer-Based Tooth Alignment Prediction With Occlusion And Collision Constraints

ZhenXing Dong, JiaZhou Chen, YangHui Xu

The planning of digital orthodontic treatment requires providing tooth alignment, which not only consumes a lot of time and labor to determine manually but also relays clinical experiences heavily. In this work, we proposed a lightweight tooth alignment neural network based on Swin-transformer. We first re-organized 3D point clouds based on virtual arch lines and converted them into order-sorted multi-channel textures, which improves the accuracy and efficiency simultaneously. We then designed two new occlusal loss functions that quantitatively evaluate the occlusal relationship between the upper and lower jaws. They are important clinical constraints, first introduced to the best of our knowledge, and lead to cutting-edge prediction accuracy. To train our network, we collected a large digital orthodontic dataset that has 591 clinical cases, including various complex clinical cases. This dataset will benefit the community after its release since there is no open dataset so far. Furthermore, we also proposed two new orthodontic dataset augmentation methods considering tooth spatial distribution and occlusion. We evaluated our method with this dataset and extensive experiments, including comparisons with STAT methods and ablation studies, and demonstrate the high prediction accuracy of our method.

CVDec 18, 2021
3D Instance Segmentation of MVS Buildings

Jiazhou Chen, Yanghui Xu, Shufang Lu et al.

We present a novel 3D instance segmentation framework for Multi-View Stereo (MVS) buildings in urban scenes. Unlike existing works focusing on semantic segmentation of urban scenes, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images by adding a heightmap and are segmented to obtain all roof instances using a fine-tuned 2D instance segmentation neural network. Instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlapping, which can eliminate segmentation ambiguities among multi-view images. Based on these global masks, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through a Markov random field optimization. A new dataset that contains instance-level annotation for both 3D urban scenes (roofs and buildings) and drone images (roofs) is provided. To the best of our knowledge, it is the first outdoor dataset dedicated to 3D instance segmentation with much more annotations of attached 3D buildings than existing datasets. Quantitative evaluations and ablation studies have shown the effectiveness of all major steps and the advantages of our multi-view framework over the orthophoto-based method.