Li-Yun Wang

CV
h-index6
4papers
Novelty40%
AI Score31

4 Papers

CVNov 25, 2025
MODEST: Multi-Optics Depth-of-Field Stereo Dataset

Nisarg K. Trivedi, Vinayak A. Belludi, Li-Yun Wang et al.

Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in depth estimation and depth-of-field rendering, research remains constrained by the lack of large-scale, high-fidelity, real stereo DSLR datasets, limiting real-world generalization and evaluation of models trained on synthetic data as shown extensively in literature. We present the first high-resolution (5472$\times$3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes and capturing the optical realism and complexity of professional camera systems. For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22), spanning 50 optical configurations in 2000 images per scene. This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis. Each focal configuration has a dedicated calibration image set, supporting evaluation of classical and learning based methods for intrinsic and extrinsic calibration. The dataset features challenging visual elements such as multi-scale optical illusions, reflective surfaces, mirrors, transparent glass walls, fine-grained details, and natural / artificial ambient light variations. This work attempts to bridge the realism gap between synthetic training data and real camera optics, and demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods. We release the dataset, calibration files, and evaluation code to support reproducible research on real-world optical generalization.

CVAug 18, 2025
WP-CLIP: Leveraging CLIP to Predict Wölfflin's Principles in Visual Art

Abhijay Ghildyal, Li-Yun Wang, Feng Liu

Wölfflin's five principles offer a structured approach to analyzing stylistic variations for formal analysis. However, no existing metric effectively predicts all five principles in visual art. Computationally evaluating the visual aspects of a painting requires a metric that can interpret key elements such as color, composition, and thematic choices. Recent advancements in vision-language models (VLMs) have demonstrated their ability to evaluate abstract image attributes, making them promising candidates for this task. In this work, we investigate whether CLIP, pre-trained on large-scale data, can understand and predict Wölfflin's principles. Our findings indicate that it does not inherently capture such nuanced stylistic elements. To address this, we fine-tune CLIP on annotated datasets of real art images to predict a score for each principle. We evaluate our model, WP-CLIP, on GAN-generated paintings and the Pandora-18K art dataset, demonstrating its ability to generalize across diverse artistic styles. Our results highlight the potential of VLMs for automated art analysis.

CVNov 7, 2021
SL-CycleGAN: Blind Motion Deblurring in Cycles using Sparse Learning

Ali Syed Saqlain, Li-Yun Wang, Fang Fang

In this paper, we introduce an end-to-end generative adversarial network (GAN) based on sparse learning for single image blind motion deblurring, which we called SL-CycleGAN. For the first time in blind motion deblurring, we propose a sparse ResNet-block as a combination of sparse convolution layers and a trainable spatial pooler k-winner based on HTM (Hierarchical Temporal Memory) to replace non-linearity such as ReLU in the ResNet-block of SL-CycleGAN generators. Furthermore, unlike many state-of-the-art GAN-based motion deblurring methods that treat motion deblurring as a linear end-to-end process, we take our inspiration from the domain-to-domain translation ability of CycleGAN, and we show that image deblurring can be cycle-consistent while achieving the best qualitative results. Finally, we perform extensive experiments on popular image benchmarks both qualitatively and quantitatively and achieve the record-breaking PSNR of 38.087 dB on GoPro dataset, which is 5.377 dB better than the most recent deblurring method.

CVJan 14, 2021
Context-Aware Image Denoising with Auto-Threshold Canny Edge Detection to Suppress Adversarial Perturbation

Li-Yun Wang, Yeganeh Jalalpour, Wu-chi Feng

This paper presents a novel context-aware image denoising algorithm that combines an adaptive image smoothing technique and color reduction techniques to remove perturbation from adversarial images. Adaptive image smoothing is achieved using auto-threshold canny edge detection to produce an accurate edge map used to produce a blurred image that preserves more edge features. The proposed algorithm then uses color reduction techniques to reconstruct the image using only a few representative colors. Through this technique, the algorithm can reduce the effects of adversarial perturbations on images. We also discuss experimental data on classification accuracy. Our results showed that the proposed approach reduces adversarial perturbation in adversarial attacks and increases the robustness of the deep convolutional neural network models.