Zhengyang Lu

h-index11

13papers

256citations

Novelty50%

AI Score38

Ranked #84,582 of 194,257 authors (top 44%)#28,520 in CV (top 48%)

13 Papers

9.8CVSep 7, 2023

Joint Self-supervised Depth and Optical Flow Estimation towards Dynamic Objects

Zhengyang Lu, Ying Chen

Significant attention has been attracted to deep learning-based depth estimates. Dynamic objects become the most hard problems in inter-frame-supervised depth estimates due to the uncertainty in adjacent frames. Thus, integrating optical flow information with depth estimation is a feasible solution, as the optical flow is an essential motion representation. In this work, we construct a joint inter-frame-supervised depth and optical flow estimation framework, which predicts depths in various motions by minimizing pixel wrap errors in bilateral photometric re-projections and optical vectors. For motion segmentation, we adaptively segment the preliminary estimated optical flow map with large areas of connectivity. In self-supervised depth estimation, different motion regions are predicted independently and then composite into a complete depth. Further, the pose and depth estimations re-synthesize the optical flow maps, serving to compute reconstruction errors with the preliminary predictions. Our proposed joint depth and optical flow estimation outperforms existing depth estimators on the KITTI Depth dataset, both with and without Cityscapes pretraining. Additionally, our optical flow results demonstrate competitive performance on the KITTI Flow 2015 dataset.

8.1CVApr 5, 2022

Pyramid Frequency Network with Spatial Attention Residual Refinement Module for Monocular Depth Estimation

Zhengyang Lu, Ying Chen

Deep-learning-based approaches to depth estimation are rapidly advancing, offering superior performance over existing methods. To estimate the depth in real-world scenarios, depth estimation models require the robustness of various noise environments. In this work, a Pyramid Frequency Network(PFN) with Spatial Attention Residual Refinement Module(SARRM) is proposed to deal with the weak robustness of existing deep-learning methods. To reconstruct depth maps with accurate details, the SARRM constructs a residual fusion method with an attention mechanism to refine the blur depth. The frequency division strategy is designed, and the frequency pyramid network is developed to extract features from multiple frequency bands. With the frequency strategy, PFN achieves better visual accuracy than state-of-the-art methods in both indoor and outdoor scenes on Make3D, KITTI depth, and NYUv2 datasets. Additional experiments on the noisy NYUv2 dataset demonstrate that PFN is more reliable than existing deep-learning methods in high-noise scenes.

3.7CVOct 25, 2024Code

Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks

Zhengyang Lu, Tianhao Guo, Feng Wang

Classical Chinese poetry and painting represent the epitome of artistic expression, but the abstract and symbolic nature of their relationship poses a significant challenge for computational translation. Most existing methods rely on large-scale paired datasets, which are scarce in this domain. In this work, we propose a semi-supervised approach using cycle-consistent adversarial networks to leverage the limited paired data and large unpaired corpus of poems and paintings. The key insight is to learn bidirectional mappings that enforce semantic alignment between the visual and textual modalities. We introduce novel evaluation metrics to assess the quality, diversity, and consistency of the generated poems and paintings. Extensive experiments are conducted on a new Chinese Painting Description Dataset (CPDD). The proposed model outperforms previous methods, showing promise in capturing the symbolic essence of artistic expression. Codes are available online \url{https://github.com/Mnster00/poemtopainting}.

2.0IVNov 11, 2020Code

Dense U-net for super-resolution with shuffle pooling layer

Zhengyang Lu, Ying Chen

Recent researches have achieved great progress on single image super-resolution(SISR) due to the development of deep learning in the field of computer vision. In these method, the high resolution input image is down-scaled to low resolution space using a single filter, commonly max-pooling, before feature extraction. This means that the feature extraction is performed in biased filtered feature space. We demonstrate that this is sub-optimal and causes information loss. In this work, we proposed a state-of-the-art convolutional neural network method called Dense U-net with shuffle pooling. To achieve this, a modified U-net with dense blocks, called dense U-net, is proposed for SISR. Then, a new pooling strategy called shuffle pooling is designed, which is aimed to replace the dense U-Net for down-scale operation. By doing so, we effectively replace the handcrafted filter in the SISR pipeline with more lossy down-sampling filters specifically trained for each feature map, whilst also reducing the information loss of the overall SISR operation. In addition, a mix loss function, which combined with Mean Square Error(MSE), Structural Similarity Index(SSIM) and Mean Gradient Error (MGE), comes up to reduce the perception loss and high-level information loss. Our proposed method achieves superior accuracy over previous state-of-the-art on the three benchmark datasets: SET14, BSD300, ICDAR2003. Code is available online.

15.6IVNov 21, 2019Code

Single Image Super Resolution based on a Modified U-net with Mixed Gradient Loss

Zhengyang Lu, Ying Chen

Single image super-resolution (SISR) is the task of inferring a high-resolution image from a single low-resolution image. Recent research on super-resolution has achieved great progress due to the development of deep convolutional neural networks in the field of computer vision. Existing super-resolution reconstruction methods have high performances in the criterion of Mean Square Error (MSE) but most methods fail to reconstruct an image with shape edges. To solve this problem, the mixed gradient error, which is composed by MSE and a weighted mean gradient error, is proposed in this work and applied to a modified U-net network as the loss function. The modified U-net removes all batch normalization layers and one of the convolution layers in each block. The operation reduces the number of parameters, and therefore accelerates the reconstruction. Compared with the existing image super-resolution algorithms, the proposed reconstruction method has better performance and time consumption. The experiments demonstrate that modified U-net network architecture with mixed gradient loss yields high-level results on three image datasets: SET14, BSD300, ICDAR2003. Code is available online.

11.3CVApr 10, 2024

Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior

Zhengyang Lu, Ying Chen

Monocular depth estimation from a single image is an ill-posed problem for computer vision due to insufficient reliable cues as the prior knowledge. Besides the inter-frame supervision, namely stereo and adjacent frames, extensive prior information is available in the same frame. Reflections from specular surfaces, informative intra-frame priors, enable us to reformulate the ill-posed depth estimation task as a multi-view synthesis. This paper proposes the first self-supervision for deep-learning depth estimation on water scenes via intra-frame priors, known as reflection supervision and geometrical constraints. In the first stage, a water segmentation network is performed to separate the reflection components from the entire image. Next, we construct a self-supervised framework to predict the target appearance from reflections, perceived as other perspectives. The photometric re-projection error, incorporating SmoothL1 and a novel photometric adaptive SSIM, is formulated to optimize pose and depth estimation by aligning the transformed virtual depths and source ones. As a supplement, the water surface is determined from real and virtual camera positions, which complement the depth of the water area. Furthermore, to alleviate these laborious ground truth annotations, we introduce a large-scale water reflection scene (WRS) dataset rendered from Unreal Engine 4. Extensive experiments on the WRS dataset prove the feasibility of the proposed method compared to state-of-the-art depth estimation techniques.

7.3AIJan 30, 2024Code

Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis

Zhengyang Lu, Stefan Siemer, Piyush Jha et al.

Modern SMT solvers, such as Z3, offer user-controllable strategies, enabling users to tailor solving strategies for their unique set of instances, thus dramatically enhancing solver performance for their use case. However, this approach of strategy customization presents a significant challenge: handcrafting an optimized strategy for a class of SMT instances remains a complex and demanding task for both solver developers and users alike. In this paper, we address this problem of automatic SMT strategy synthesis via a novel Monte Carlo Tree Search (MCTS) based method. Our method treats strategy synthesis as a sequential decision-making process, whose search tree corresponds to the strategy space, and employs MCTS to navigate this vast search space. The key innovations that enable our method to identify effective strategies, while keeping costs low, are the ideas of layered and staged MCTS search. These novel heuristics allow for a deeper and more efficient exploration of the strategy space, enabling us to synthesize more effective strategies than the default ones in state-of-the-art (SOTA) SMT solvers. We implement our method, dubbed Z3alpha, as part of the Z3 SMT solver. Through extensive evaluations across six important SMT logics, Z3alpha demonstrates superior performance compared to the SOTA synthesis tool FastSMT, the default Z3 solver, and the CVC5 solver on most benchmarks. Remarkably, on a challenging QF_BV benchmark set, Z3alpha solves 42.7% more instances than the default strategy in the Z3 SMT solver.

19.0CVJan 27, 2025Code

CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference

Zhengyang Lu, Bingjie Lu, Feng Wang

Physical and optical factors interacting with sensor characteristics create complex image degradation patterns. Despite advances in deep learning-based super-resolution, existing methods overlook the causal nature of degradation by adopting simplistic black-box mappings. This paper formulates super-resolution using structural causal models to reason about image degradation processes. We establish a mathematical foundation that unifies principles from causal inference, deriving necessary conditions for identifying latent degradation mechanisms and corresponding propagation. We propose a novel counterfactual learning strategy that leverages semantic guidance to reason about hypothetical degradation scenarios, leading to theoretically-grounded representations that capture invariant features across different degradation conditions. The framework incorporates an adaptive intervention mechanism with provable bounds on treatment effects, allowing precise manipulation of degradation factors while maintaining semantic consistency. Through extensive empirical validation, we demonstrate that our approach achieves significant improvements over state-of-the-art methods, particularly in challenging scenarios with compound degradations. On standard benchmarks, our method consistently outperforms existing approaches by significant margins (0.86-1.21dB PSNR), while providing interpretable insights into the restoration process. The theoretical framework and empirical results demonstrate the fundamental importance of causal reasoning in understanding image restoration systems.

6.5CVDec 29, 2024

Single-image reflection removal via self-supervised diffusion models

Zhengyang Lu, Weifan Wang, Tianhao Guo et al.

Reflections often degrade the visual quality of images captured through transparent surfaces, and reflection removal methods suffers from the shortage of paired real-world samples.This paper proposes a hybrid approach that combines cycle-consistency with denoising diffusion probabilistic models (DDPM) to effectively remove reflections from single images without requiring paired training data. The method introduces a Reflective Removal Network (RRN) that leverages DDPMs to model the decomposition process and recover the transmission image, and a Reflective Synthesis Network (RSN) that re-synthesizes the input image using the separated components through a nonlinear attention-based mechanism. Experimental results demonstrate the effectiveness of the proposed method on the SIR$^2$, Flash-Based Reflection Removal (FRR) Dataset, and a newly introduced Museum Reflection Removal (MRR) dataset, showing superior performance compared to state-of-the-art methods.

3.7CVJan 14, 2024Code

City Scene Super-Resolution via Geometric Error Minimization

Zhengyang Lu, Feng Wang

Super-resolution techniques are crucial in improving image granularity, particularly in complex urban scenes, where preserving geometric structures is vital for data-informed cultural heritage applications. In this paper, we propose a city scene super-resolution method via geometric error minimization. The geometric-consistent mechanism leverages the Hough Transform to extract regular geometric features in city scenes, enabling the computation of geometric errors between low-resolution and high-resolution images. By minimizing mixed mean square error and geometric align error during the super-resolution process, the proposed method efficiently restores details and geometric regularities. Extensive validations on the SET14, BSD300, Cityscapes and GSV-Cities datasets demonstrate that the proposed method outperforms existing state-of-the-art methods, especially in urban scenes.

6.2CVJul 4, 2025

Predicting Asphalt Pavement Friction Using Texture-Based Image Indicator

Bingjie Lu, Zhengyang Lu, Yijiashun Qi et al.

Pavement skid resistance is of vital importance for road safety. The objective of this study is to propose and validate a texture-based image indicator to predict pavement friction. This index enables pavement friction to be measured easily and inexpensively using digital images. Three different types of asphalt surfaces (dense-graded asphalt mix, open-grade friction course, and chip seal) were evaluated subject to various tire polishing cycles. Images were taken with corresponding friction measured using Dynamic Friction Tester (DFT) in the laboratory. The aggregate protrusion area is proposed as the indicator. Statistical models are established for each asphalt surface type to correlate the proposed indicator with friction coefficients. The results show that the adjusted R-square values of all relationships are above 0.90. Compared to other image-based indicators in the literature, the proposed image indicator more accurately reflects the changes in pavement friction with the number of polishing cycles, proving its cost-effective use for considering pavement friction in mix design stage.

6.2CVMay 18, 2025

CLIP-aware Domain-Adaptive Super-Resolution

Zhengyang Lu, Qian Xia, Weifan Wang et al.

This work introduces CLIP-aware Domain-Adaptive Super-Resolution (CDASR), a novel framework that addresses the critical challenge of domain generalization in single image super-resolution. By leveraging the semantic capabilities of CLIP (Contrastive Language-Image Pre-training), CDASR achieves unprecedented performance across diverse domains and extreme scaling factors. The proposed method integrates CLIP-guided feature alignment mechanism with a meta-learning inspired few-shot adaptation strategy, enabling efficient knowledge transfer and rapid adaptation to target domains. A custom domain-adaptive module processes CLIP features alongside super-resolution features through a multi-stage transformation process, including CLIP feature processing, spatial feature generation, and feature fusion. This intricate process ensures effective incorporation of semantic information into the super-resolution pipeline. Additionally, CDASR employs a multi-component loss function that combines pixel-wise reconstruction, perceptual similarity, and semantic consistency. Extensive experiments on benchmark datasets demonstrate CDASR's superiority, particularly in challenging scenarios. On the Urban100 dataset at $\times$8 scaling, CDASR achieves a significant PSNR gain of 0.15dB over existing methods, with even larger improvements of up to 0.30dB observed at $\times$16 scaling.

5.8AIJan 24, 2024

AlphaMapleSAT: An MCTS-based Cube-and-Conquer SAT Solver for Hard Combinatorial Problems

Piyush Jha, Zhengyu Li, Zhengyang Lu et al.

This paper introduces AlphaMapleSAT, a novel Monte Carlo Tree Search (MCTS) based Cube-and-Conquer (CnC) SAT solving method aimed at efficiently solving challenging combinatorial problems. Despite the tremendous success of CnC solvers in solving a variety of hard combinatorial problems, the lookahead cubing techniques at the heart of CnC have not evolved much for many years. Part of the reason is the sheer difficulty of coming up with new cubing techniques that are both low-cost and effective in partitioning input formulas into sub-formulas, such that the overall runtime is minimized. Lookahead cubing techniques used by current state-of-the-art CnC solvers, such as March, keep their cubing costs low by constraining the search for the optimal splitting variables. By contrast, our key innovation is a deductively-driven MCTS-based lookahead cubing technique, that performs a deeper heuristic search to find effective cubes, while keeping the cubing cost low. We perform an extensive comparison of AlphaMapleSAT against the March CnC solver on challenging combinatorial problems such as the minimum Kochen-Specker and Ramsey problems. We also perform ablation studies to verify the efficacy of the MCTS heuristic search for the cubing problem. Results show up to 2.3x speedup in parallel (and up to 27x in sequential) elapsed real time.