Shaolin Zhang

CV
h-index11
4papers
9citations
Novelty60%
AI Score45

4 Papers

CVSep 3, 2024Code
EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

Zhen Zhou, Yunkai Ma, Junfeng Fan et al.

Panoptic 3D reconstruction from a monocular video is a fundamental perceptual task in robotic scene understanding. However, existing efforts suffer from inefficiency in terms of inference speed and accuracy, limiting their practical applicability. We present EPRecon, an efficient real-time panoptic 3D reconstruction framework. Current volumetric-based reconstruction methods usually utilize multi-view depth map fusion to obtain scene depth priors, which is time-consuming and poses challenges to real-time scene reconstruction. To address this issue, we propose a lightweight module to directly estimate scene depth priors in a 3D volume for reconstruction quality improvement by generating occupancy probabilities of all voxels. In addition, compared with existing panoptic segmentation methods, EPRecon extracts panoptic features from both voxel features and corresponding image features, obtaining more detailed and comprehensive instance-level semantic information and achieving more accurate segmentation results. Experimental results on the ScanNetV2 dataset demonstrate the superiority of EPRecon over current state-of-the-art methods in terms of both panoptic 3D reconstruction quality and real-time inference. Code is available at https://github.com/zhen6618/EPRecon.

CVApr 27
Monocular Depth Estimation via Neural Network with Learnable Algebraic Group and Ring Structures

Qianlei Wang, Kexun Chen, Shaolin Zhang et al.

Monocular depth estimation (MDE) has witnessed remarkable progress driven by Convolutional Neural Networks and transformer-based architectures. However, these approaches typically treat the problem as a generic image-to-image regression on Euclidean grids, thereby overlooking the intrinsic algebraic and geometric structures induced by perspective projection. To address this limitation, we propose LAGRNet, a novel framework that fundamentally grounds MDE in algebraic geometry by explicitly embedding learnable group, ring, and sheaf structures into the deep learning pipeline. Modeling feature maps as sections of a sheaf over an approximated image manifold, our method first establishes a Group-defined Feature Manifold (GFM) parameterized by a learned algebraic group action to enforce projective equivariance and robustness against view changes. To facilitate algebraically consistent cross-scale interactions, we subsequently introduce a Ring Convolution Layer (RCL) that formulates feature fusion as a graded ring homomorphism. Furthermore, to ensure global topological consistency, a Sheaf-based Module (SM) aggregates local depth cues via Čech nerve on the image topology. Extensive zero-shot evaluations across the KITTI, NYU-Depth V2, and ETH3D benchmarks demonstrate that LAGRNet significantly outperforms state-of-the-art methods in both accuracy and generalization capabilities.

CVSep 2, 2025
A Multimodal Cross-View Model for Predicting Postoperative Neck Pain in Cervical Spondylosis Patients

Jingyang Shan, Qishuai Yu, Jiacen Liu et al.

Neck pain is the primary symptom of cervical spondylosis, yet its underlying mechanisms remain unclear, leading to uncertain treatment outcomes. To address the challenges of multimodal feature fusion caused by imaging differences and spatial mismatches, this paper proposes an Adaptive Bidirectional Pyramid Difference Convolution (ABPDC) module that facilitates multimodal integration by exploiting the advantages of difference convolution in texture extraction and grayscale invariance, and a Feature Pyramid Registration Auxiliary Network (FPRAN) to mitigate structural misalignment. Experiments on the MMCSD dataset demonstrate that the proposed model achieves superior prediction accuracy of postoperative neck pain recovery compared with existing methods, and ablation studies further confirm its effectiveness.

CVMay 15, 2024
Fourier Boundary Features Network with Wider Catchers for Glass Segmentation

Xiaolin Qin, Jiacen Liu, Qianlei Wang et al.

Glass largely blurs the boundary between the real world and the reflection. The special transmittance and reflectance quality have confused the semantic tasks related to machine vision. Therefore, how to clear the boundary built by glass, and avoid over-capturing features as false positive information in deep structure, matters for constraining the segmentation of reflection surface and penetrating glass. We proposed the Fourier Boundary Features Network with Wider Catchers (FBWC), which might be the first attempt to utilize sufficiently wide horizontal shallow branches without vertical deepening for guiding the fine granularity segmentation boundary through primary glass semantic information. Specifically, we designed the Wider Coarse-Catchers (WCC) for anchoring large area segmentation and reducing excessive extraction from a structural perspective. We embed fine-grained features by Cross Transpose Attention (CTA), which is introduced to avoid the incomplete area within the boundary caused by reflection noise. For excavating glass features and balancing high-low layers context, a learnable Fourier Convolution Controller (FCC) is proposed to regulate information integration robustly. The proposed method has been validated on three different public glass segmentation datasets. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art (SOTA) methods in glass image segmentation.