CVSep 6, 2022
LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field ImagesShansi Zhang, Nan Meng, Edmund Y. Lam
Light field (LF) images containing information for multiple views have numerous applications, which can be severely affected by low-light imaging. Recent learning-based methods for low-light enhancement have some disadvantages, such as a lack of noise suppression, complex training process and poor performance in extremely low-light conditions. To tackle these deficiencies while fully utilizing the multi-view information, we propose an efficient Low-light Restoration Transformer (LRT) for LF images, with multiple heads to perform intermediate tasks within a single network, including denoising, luminance adjustment, refinement and detail enhancement, achieving progressive restoration from small scale to full scale. Moreover, we design an angular transformer block with an efficient view-token scheme to model the global angular dependencies, and a multi-scale spatial transformer block to encode the multi-scale local and global information within each view. To address the issue of insufficient training data, we formulate a synthesis pipeline by simulating the major noise sources with the estimated noise parameters of LF camera. Experimental results demonstrate that our method achieves the state-of-the-art performance on low-light LF restoration with high efficiency.
CVJan 20, 2023
Unsupervised Light Field Depth Estimation via Multi-view Feature Matching with Occlusion PredictionShansi Zhang, Nan Meng, Edmund Y. Lam
Depth estimation from light field (LF) images is a fundamental step for numerous applications. Recently, learning-based methods have achieved higher accuracy and efficiency than the traditional methods. However, it is costly to obtain sufficient depth labels for supervised training. In this paper, we propose an unsupervised framework to estimate depth from LF images. First, we design a disparity estimation network (DispNet) with a coarse-to-fine structure to predict disparity maps from different view combinations. It explicitly performs multi-view feature matching to learn the correspondences effectively. As occlusions may cause the violation of photo-consistency, we introduce an occlusion prediction network (OccNet) to predict the occlusion maps, which are used as the element-wise weights of photometric loss to solve the occlusion issue and assist the disparity learning. With the disparity maps estimated by multiple input combinations, we then propose a disparity fusion strategy based on the estimated errors with effective occlusion handling to obtain the final disparity map with higher accuracy. Experimental results demonstrate that our method achieves superior performance on both the dense and sparse LF images, and also shows better robustness and generalization on the real-world LF images compared to the other methods.
CVSep 29, 2025
Accurate Cobb Angle Estimation via SVD-Based Curve Detection and Vertebral Wedging QuantificationChang Shi, Nan Meng, Yipeng Zhuang et al.
Adolescent idiopathic scoliosis (AIS) is a common spinal deformity affecting approximately 2.2% of boys and 4.8% of girls worldwide. The Cobb angle serves as the gold standard for AIS severity assessment, yet traditional manual measurements suffer from significant observer variability, compromising diagnostic accuracy. Despite prior automation attempts, existing methods use simplified spinal models and predetermined curve patterns that fail to address clinical complexity. We present a novel deep learning framework for AIS assessment that simultaneously predicts both superior and inferior endplate angles with corresponding midpoint coordinates for each vertebra, preserving the anatomical reality of vertebral wedging in progressive AIS. Our approach combines an HRNet backbone with Swin-Transformer modules and biomechanically informed constraints for enhanced feature extraction. We employ Singular Value Decomposition (SVD) to analyze angle predictions directly from vertebral morphology, enabling flexible detection of diverse scoliosis patterns without predefined curve assumptions. Using 630 full-spine anteroposterior radiographs from patients aged 10-18 years with rigorous dual-rater annotation, our method achieved 83.45% diagnostic accuracy and 2.55° mean absolute error. The framework demonstrates exceptional generalization capability on out-of-distribution cases. Additionally, we introduce the Vertebral Wedging Index (VWI), a novel metric quantifying vertebral deformation. Longitudinal analysis revealed VWI's significant prognostic correlation with curve progression while traditional Cobb angles showed no correlation, providing robust support for early AIS detection, personalized treatment planning, and progression monitoring.
CVSep 29, 2025
LatXGen: Towards Radiation-Free and Accurate Quantitative Analysis of Sagittal Spinal Alignment Via Cross-Modal Radiographic View SynthesisMoxin Zhao, Nan Meng, Jason Pui Yin Cheung et al.
Adolescent Idiopathic Scoliosis (AIS) is a complex three-dimensional spinal deformity, and accurate morphological assessment requires evaluating both coronal and sagittal alignment. While previous research has made significant progress in developing radiation-free methods for coronal plane assessment, reliable and accurate evaluation of sagittal alignment without ionizing radiation remains largely underexplored. To address this gap, we propose LatXGen, a novel generative framework that synthesizes realistic lateral spinal radiographs from posterior Red-Green-Blue and Depth (RGBD) images of unclothed backs. This enables accurate, radiation-free estimation of sagittal spinal alignment. LatXGen tackles two core challenges: (1) inferring sagittal spinal morphology changes from a lateral perspective based on posteroanterior surface geometry, and (2) performing cross-modality translation from RGBD input to the radiographic domain. The framework adopts a dual-stage architecture that progressively estimates lateral spinal structure and synthesizes corresponding radiographs. To enhance anatomical consistency, we introduce an attention-based Fast Fourier Convolution (FFC) module for integrating anatomical features from RGBD images and 3D landmarks, and a Spatial Deformation Network (SDN) to model morphological variations in the lateral view. Additionally, we construct the first large-scale paired dataset for this task, comprising 3,264 RGBD and lateral radiograph pairs. Experimental results demonstrate that LatXGen produces anatomically accurate radiographs and outperforms existing GAN-based methods in both visual fidelity and quantitative metrics. This study offers a promising, radiation-free solution for sagittal spine assessment and advances comprehensive AIS evaluation.
AISep 15, 2025
Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer FrameworkZhaolong Wu, Pu Luo, Nan Meng et al.
This study presents the first comprehensive evaluation of Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management. We constructed a database of approximately 3,000 anteroposterior X-rays with diagnostic texts and evaluated five MLLMs through a `Divide and Conquer' framework consisting of a visual question-answering task, a domain knowledge assessment task, and a patient education counseling assessment task. Our investigation revealed limitations of MLLMs' ability in interpreting complex spinal radiographs and comprehending AIS care knowledge. To address these, we pioneered enhancing MLLMs with spinal keypoint prompting and compiled an AIS knowledge base for retrieval augmented generation (RAG), respectively. Results showed varying effectiveness of visual prompting across different architectures, while RAG substantially improved models' performances on the knowledge assessment task. Our findings indicate current MLLMs are far from capable in realizing personalized assistant in AIS care. The greatest challenge lies in their abilities to obtain accurate detections of spinal deformity locations (best accuracy: 0.55) and directions (best accuracy: 0.13).
IVJul 30, 2025
A Dual-Feature Extractor Framework for Accurate Back Depth and Spine Morphology Estimation from Monocular RGB ImagesYuxin Wei, Yue Zhang, Moxin Zhao et al.
Scoliosis is a prevalent condition that impacts both physical health and appearance, with adolescent idiopathic scoliosis (AIS) being the most common form. Currently, the main AIS assessment tool, X-rays, poses significant limitations, including radiation exposure and limited accessibility in poor and remote areas. To address this problem, the current solutions are using RGB images to analyze spine morphology. However, RGB images are highly susceptible to environmental factors, such as lighting conditions, compromising model stability and generalizability. Therefore, in this study, we propose a novel pipeline to accurately estimate the depth information of the unclothed back, compensating for the limitations of 2D information, and then estimate spine morphology by integrating both depth and surface information. To capture the subtle depth variations of the back surface with precision, we design an adaptive multiscale feature learning network named Grid-Aware Multiscale Adaptive Network (GAMA-Net). This model uses dual encoders to extract both patch-level and global features, which are then interacted by the Patch-Based Hybrid Attention (PBHA) module. The Adaptive Multiscale Feature Fusion (AMFF) module is used to dynamically fuse information in the decoder. As a result, our depth estimation model achieves remarkable accuracy across three different evaluation metrics, with scores of nearly 78.2%, 93.6%, and 97.5%, respectively. To further validate the effectiveness of the predicted depth, we integrate both surface and depth information for spine morphology estimation. This integrated approach enhances the accuracy of spine curve generation, achieving an impressive performance of up to 97%.
CVSep 7, 2020
Light Field View Synthesis via Aperture Disparity and Warping Confidence MapNan Meng, Kai Li, Jianzhuang Liu et al.
This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images. A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path. We overcome this by jointly modeling the epipolar property and occlusion in designing a convolutional neural network. We start by defining and computing the aperture disparity map, which approximates the parallax and measures the pixel-wise shift between two views. While this relates to free-space rendering and can fail near the object boundaries, we further develop a warping confidence map to address pixel occlusion in these challenging regions. The proposed method is evaluated on diverse real-world and synthetic light field scenes, and it shows better performance over several state-of-the-art techniques.
IVMar 29, 2020
High-Order Residual Network for Light Field Super-ResolutionNan Meng, Xiaofei Wu, Jianzhuang Liu et al.
Plenoptic cameras usually sacrifice the spatial resolution of their SAIs to acquire geometry information from different viewpoints. Several methods have been proposed to mitigate such spatio-angular trade-off, but seldom make use of the structural properties of the light field (LF) data efficiently. In this paper, we propose a novel high-order residual network to learn the geometric features hierarchically from the LF for reconstruction. An important component in the proposed network is the high-order residual block (HRB), which learns the local geometric features by considering the information from all input views. After fully obtaining the local features learned from each HRB, our model extracts the representative geometric features for spatio-angular upsampling through the global residual learning. Additionally, a refinement network is followed to further enhance the spatial details by minimizing a perceptual loss. Compared with previous work, our model is tailored to the rich structure inherent in the LF, and therefore can reduce the artifacts near non-Lambertian and occlusion regions. Experimental results show that our approach enables high-quality reconstruction even in challenging regions and outperforms state-of-the-art single image or LF reconstruction methods with both quantitative measurements and visual evaluation.
IVOct 3, 2019
High-dimensional Dense Residual Convolutional Neural Network for Light Field ReconstructionNan Meng, Hayden K. -H. So, Xing Sun et al.
We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial and angular super-resolution. Many current approaches either require disparity clues or restore the spatial and angular details separately. Such methods have difficulties with non-Lambertian surfaces or occlusions. In contrast, we formulate light field super-resolution (LFSR) as tensor restoration and develop a learning framework based on a two-stage restoration with 4-dimensional (4D) convolution. This allows our model to learn the features capturing the geometry information encoded in multiple adjacent views. Such geometric features vary near the occlusion regions and indicate the foreground object border. To train a feasible network, we propose a novel normalization operation based on a group of views in the feature maps, design a stage-wise loss function, and develop the multi-range training strategy to further improve the performance. Evaluations are conducted on a number of light field datasets including real-world scenes, synthetic data, and microscope light fields. The proposed method achieves superior performance and less execution time comparing with other state-of-the-art schemes.