Yixin Zhuang

CV
h-index2
7papers
349citations
Novelty54%
AI Score43

7 Papers

CVAug 14, 2022Code
Visual Localization via Few-Shot Scene Region Classification

Siyan Dong, Shuzhe Wang, Yixin Zhuang et al.

Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes. Code available at: \url{https://github.com/siyandong/SRC}

CVJun 6, 2021Code
Neural Implicit 3D Shapes from Single Images with Spatial Patterns

Yixin Zhuang, Yunzhe Liu, Yujie Wang et al.

Neural implicit functions have achieved impressive results for reconstructing 3D shapes from single images. However, the image features for describing 3D point samplings of implicit functions are less effective when significant variations of occlusions, views, and appearances exist from the image. To better encode image features, we study a geometry-aware convolutional kernel to leverage geometric relationships of point samplings by the proposed \emph{spatial pattern}, i.e., a structured point set. Specifically, the kernel operates at 2D projections of 3D points from the spatial pattern. Supported by the spatial pattern, the 2D kernel encodes geometric information that is crucial for 3D reconstruction tasks, while traditional ones mainly consider appearance information. Furthermore, to enable the network to discover more adaptive spatial patterns for further capturing non-local contextual information, the kernel is devised to be deformable manipulated by a spatial pattern generator. Experimental results on both synthetic and real datasets demonstrate the superiority of the proposed method. Pre-trained models, codes, and data are available at https://github.com/yixin26/SVR-SP.

LGDec 17, 2025
Understanding NTK Variance in Implicit Neural Representations

Chengguang Ou, Yixin Zhuang

Implicit Neural Representations (INRs) often converge slowly and struggle to recover high-frequency details due to spectral bias. While prior work links this behavior to the Neural Tangent Kernel (NTK), how specific architectural choices affect NTK conditioning remains unclear. We show that many INR mechanisms can be understood through their impact on a small set of pairwise similarity factors and scaling terms that jointly determine NTK eigenvalue variance. For standard coordinate MLPs, limited input-feature interactions induce large eigenvalue dispersion and poor conditioning. We derive closed-form variance decompositions for common INR components and show that positional encoding reshapes input similarity, spherical normalization reduces variance via layerwise scaling, and Hadamard modulation introduces additional similarity factors strictly below one, yielding multiplicative variance reduction. This unified view explains how diverse INR architectures mitigate spectral bias by improving NTK conditioning. Experiments across multiple tasks confirm the predicted variance reductions and demonstrate faster, more stable convergence with improved reconstruction quality.

CVJan 31, 2022
A Simple And Effective Filtering Scheme For Improving Neural Fields

Yixin Zhuang

Recently, neural fields, also known as coordinate-based MLPs, have achieved impressive results in representing low-dimensional data. Unlike CNN, MLPs are globally connected and lack local control; adjusting a local region leads to global changes. Therefore, improving local neural fields usually leads to a dilemma: filtering out local artifacts can simultaneously smooth away desired details. Our solution is a new filtering technique that consists of two counteractive operators: a smoothing operator that provides global smoothing for better generalization, and conversely a recovering operator that provides better controllability for local adjustments. We have found that using either operator alone can lead to an increase in noisy artifacts or oversmoothed regions. By combining the two operators, smoothing and sharpening can be adjusted to first smooth the entire region and then recover fine-grained details in regions overly smoothed. In this way, our filter helps neural fields remove much noise while enhancing details. We demonstrate the benefits of our filter on various tasks and show significant improvements over state-of-the-art methods. Moreover, our filter also provides better performance in terms of convergence speed and network stability.

CVMar 17, 2020
Multimodal Shape Completion via Conditional Generative Adversarial Networks

Rundi Wu, Xuelin Chen, Yixin Zhuang et al.

Several deep learning methods have been proposed for completing partial data from shape acquisition setups, i.e., filling the regions that were missing in the shape. These methods, however, only complete the partial shape with a single output, ignoring the ambiguity when reasoning the missing geometry. Hence, we pose a multi-modal shape completion problem, in which we seek to complete the partial shape with multiple outputs by learning a one-to-many mapping. We develop the first multimodal shape completion method that completes the partial shape via conditional generative modeling, without requiring paired training data. Our approach distills the ambiguity by conditioning the completion on a learned multimodal distribution of possible results. We extensively evaluate the approach on several datasets that contain varying forms of shape incompleteness, and compare among several baseline methods and variants of our methods qualitatively and quantitatively, demonstrating the merit of our method in completing partial shapes with both diversity and quality.

CVNov 26, 2019
Decoupling Features and Coordinates for Few-shot RGB Relocalization

Siyan Dong, Songyin Wu, Yixin Zhuang et al.

Cross-scene model adaption is crucial for camera relocalization in real scenarios. It is often preferable that a pre-learned model can be fast adapted to a novel scene with as few training samples as possible. The existing state-of-the-art approaches, however, can hardly support such few-shot scene adaption due to the entangling of image feature extraction and scene coordinate regression. To address this issue, we approach camera relocalization with a decoupled solution where feature extraction, coordinate regression, and pose estimation are performed separately. Our key insight is that feature encoder used for coordinate regression should be learned by removing the distracting factor of coordinate systems, such that feature encoder is learned from multiple scenes for general feature representation and more important, view-insensitive capability. With this feature prior, and combined with a coordinate regressor, few-shot observations in a new scene are much easier to connect with the 3D world than the one with existing integrated solution. Experiments have shown the superiority of our approach compared to the state-of-the-art methods, producing higher accuracy on several scenes with diverse visual appearance and viewpoint distribution.

CVNov 25, 2019
PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes

Rundi Wu, Yixin Zhuang, Kai Xu et al.

We introduce PQ-NET, a deep neural network which represents and generates 3D shapes via sequential part assembly. The input to our network is a 3D shape segmented into parts, where each part is first encoded into a feature representation using a part autoencoder. The core component of PQ-NET is a sequence-to-sequence or Seq2Seq autoencoder which encodes a sequence of part features into a latent vector of fixed size, and the decoder reconstructs the 3D shape, one part at a time, resulting in a sequential assembly. The latent space formed by the Seq2Seq encoder encodes both part structure and fine part geometry. The decoder can be adapted to perform several generative tasks including shape autoencoding, interpolation, novel shape generation, and single-view 3D reconstruction, where the generated shapes are all composed of meaningful parts.