CVJun 28, 2023
Let Segment Anything Help Image DehazeZheyan Jin, Shiqi Chen, Yueting Chen et al.
The large language model and high-level vision model have achieved impressive performance improvements with large datasets and model sizes. However, low-level computer vision tasks, such as image dehaze and blur removal, still rely on a small number of datasets and small-sized models, which generally leads to overfitting and local optima. Therefore, we propose a framework to integrate large-model prior into low-level computer vision tasks. Just as with the task of image segmentation, the degradation of haze is also texture-related. So we propose to detect gray-scale coding, network channel expansion, and pre-dehaze structures to integrate large-model prior knowledge into any low-level dehazing network. We demonstrate the effectiveness and applicability of large models in guiding low-level visual tasks through different datasets and algorithms comparison experiments. Finally, we demonstrate the effect of grayscale coding, network channel expansion, and recurrent network structures through ablation experiments. Under the conditions where additional data and training resources are not required, we successfully prove that the integration of large-model prior knowledge will improve the dehaze performance and save training time for low-level visual tasks.
OPTICSSep 8, 2023
Revealing the preference for correcting separated aberrations in joint optic-image designJingwen Zhou, Shiqi Chen, Zheng Ren et al.
The joint design of the optical system and the downstream algorithm is a challenging and promising task. Due to the demand for balancing the global optimal of imaging systems and the computational cost of physical simulation, existing methods cannot achieve efficient joint design of complex systems such as smartphones and drones. In this work, starting from the perspective of the optical design, we characterize the optics with separated aberrations. Additionally, to bridge the hardware and software without gradients, an image simulation system is presented to reproduce the genuine imaging procedure of lenses with large field-of-views. As for aberration correction, we propose a network to perceive and correct the spatially varying aberrations and validate its superiority over state-of-the-art methods. Comprehensive experiments reveal that the preference for correcting separated aberrations in joint design is as follows: longitudinal chromatic aberration, lateral chromatic aberration, spherical aberration, field curvature, and coma, with astigmatism coming last. Drawing from the preference, a 10% reduction in the total track length of the consumer-level mobile phone lens module is accomplished. Moreover, this procedure spares more space for manufacturing deviations, realizing extreme-quality enhancement of computational photography. The optimization paradigm provides innovative insight into the practical joint design of sophisticated optical systems and post-processing algorithms.
CVMar 16, 2023
Reliable Image Dehazing by NeRFZheyan Jin, Shiqi Chen, Huajun Feng et al.
We present an image dehazing algorithm with high quality, wide application, and no data training or prior needed. We analyze the defects of the original dehazing model, and propose a new and reliable dehazing reconstruction and dehazing model based on the combination of optical scattering model and computer graphics lighting rendering model. Based on the new haze model and the images obtained by the cameras, we can reconstruct the three-dimensional space, accurately calculate the objects and haze in the space, and use the transparency relationship of haze to perform accurate haze removal. To obtain a 3D simulation dataset we used the Unreal 5 computer graphics rendering engine. In order to obtain real shot data in different scenes, we used fog generators, array cameras, mobile phones, underwater cameras and drones to obtain haze data. We use formula derivation, simulation data set and real shot data set result experimental results to prove the feasibility of the new method. Compared with various other methods, we are far ahead in terms of calculation indicators (4 dB higher quality average scene), color remains more natural, and the algorithm is more robust in different scenarios and best in the subjective perception.
CVJun 28, 2023
Toward Real Flare Removal: A Comprehensive Pipeline and A New BenchmarkZheyan Jin, Shiqi Chen, Huajun Feng et al.
Photographing in the under-illuminated scenes, the presence of complex light sources often leave strong flare artifacts in images, where the intensity, the spectrum, the reflection, and the aberration altogether contribute the deterioration. Besides the image quality, it also influence the performance of down-stream visual applications. Thus, removing the lens flare and ghosts is a challenge issue especially in low-light environment. However, existing methods for flare removal mainly restricted to the problems of inadequate simulation and real-world capture, where the categories of scattered flares are singular and the reflected ghosts are unavailable. Therefore, a comprehensive deterioration procedure is crucial for constructing the dataset of flare removal. Based on the theoretical analysis and real-world evaluation, we propose a well-developed methodology for generating the data-pairs with flare deterioration. The procedure is comprehensive, where the similarity of scattered flares and the symmetric effect of reflected ghosts are realized. Moreover, we also construct a real-shot pipeline that respectively processes the effects of scattering and reflective flares, aiming to directly generate the data for end-to-end methods. Experimental results show that the proposed methodology add diversity to the existing flare datasets and construct a comprehensive mapping procedure for flare data pairs. And our method facilities the data-driven model to realize better restoration in flare images and proposes a better evaluation system based on real shots, resulting promote progress in the area of real flare removal.
CVNov 30, 2025
CircleFlow: Flow-Guided Camera Blur Estimation using a Circle Grid TargetJiajian He, Enjie Hu, Shiqi Chen et al.
The point spread function (PSF) serves as a fundamental descriptor linking the real-world scene to the captured signal, manifesting as camera blur. Accurate PSF estimation is crucial for both optical characterization and computational vision, yet remains challenging due to the inherent ambiguity and the ill-posed nature of intensity-based deconvolution. We introduce CircleFlow, a high-fidelity PSF estimation framework that employs flow-guided edge localization for precise blur characterization. CircleFlow begins with a structured capture that encodes locally anisotropic and spatially varying PSFs by imaging a circle grid target, while leveraging the target's binary luminance prior to decouple image and kernel estimation. The latent sharp image is then reconstructed through subpixel alignment of an initialized binary structure guided by optical flow, whereas the PSF is modeled as an energy-constrained implicit neural representation. Both components are jointly optimized within a demosaicing-aware differentiable framework, ensuring physically consistent and robust PSF estimation enabled by accurate edge localization. Extensive experiments on simulated and real-world data demonstrate that CircleFlow achieves state-of-the-art accuracy and reliability, validating its effectiveness for practical PSF calibration.
IVAug 8, 2024
Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restorationZiran Zhang, Yuhang Tang, Zhigang Wang et al.
Infrared imaging and turbulence strength measurements are in widespread demand in many fields. This paper introduces a Physical Prior Guided Cooperative Learning (P2GCL) framework to jointly enhance atmospheric turbulence strength estimation and infrared image restoration. P2GCL involves a cyclic collaboration between two models, i.e., a TMNet measures turbulence strength and outputs the refractive index structure constant (Cn2) as a physical prior, a TRNet conducts infrared image sequence restoration based on Cn2 and feeds the restored images back to the TMNet to boost the measurement accuracy. A novel Cn2-guided frequency loss function and a physical constraint loss are introduced to align the training process with physical theories. Experiments demonstrate P2GCL achieves the best performance for both turbulence strength estimation (improving Cn2 MAE by 0.0156, enhancing R2 by 0.1065) and image restoration (enhancing PSNR by 0.2775 dB), validating the significant impact of physical prior guided cooperative learning.
CVMar 26, 2025Code
EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame InterpolationZiran Zhang, Xiaohui Li, Yihao Liu et al.
Video frame interpolation (VFI) in scenarios with large motion remains challenging due to motion ambiguity between frames. While event cameras can capture high temporal resolution motion information, existing event-based VFI methods struggle with limited training data and complex motion patterns. In this paper, we introduce Event-Guided Video Diffusion Model (EGVD), a novel framework that leverages the powerful priors of pre-trained stable video diffusion models alongside the precise temporal information from event cameras. Our approach features a Multi-modal Motion Condition Generator (MMCG) that effectively integrates RGB frames and event signals to guide the diffusion process, producing physically realistic intermediate frames. We employ a selective fine-tuning strategy that preserves spatial modeling capabilities while efficiently incorporating event-guided temporal information. We incorporate input-output normalization techniques inspired by recent advances in diffusion modeling to enhance training stability across varying noise levels. To improve generalization, we construct a comprehensive dataset combining both real and simulated event data across diverse scenarios. Extensive experiments on both real and simulated datasets demonstrate that EGVD significantly outperforms existing methods in handling large motion and challenging lighting conditions, achieving substantial improvements in perceptual quality metrics (27.4% better LPIPS on Prophesee and 24.1% on BSRGB) while maintaining competitive fidelity measures. Code and datasets available at: https://github.com/OpenImagingLab/EGVD.
CVJan 16, 2024Code
Deep Linear Array Pushbroom Image Restoration: A Degradation Pipeline and Jitter-Aware Restoration NetworkZida Chen, Ziran Zhang, Haoying Li et al.
Linear Array Pushbroom (LAP) imaging technology is widely used in the realm of remote sensing. However, images acquired through LAP always suffer from distortion and blur because of camera jitter. Traditional methods for restoring LAP images, such as algorithms estimating the point spread function (PSF), exhibit limited performance. To tackle this issue, we propose a Jitter-Aware Restoration Network (JARNet), to remove the distortion and blur in two stages. In the first stage, we formulate an Optical Flow Correction (OFC) block to refine the optical flow of the degraded LAP images, resulting in pre-corrected images where most of the distortions are alleviated. In the second stage, for further enhancement of the pre-corrected images, we integrate two jitter-aware techniques within the Spatial and Frequency Residual (SFRes) block: 1) introducing Coordinate Attention (CoA) to the SFRes block in order to capture the jitter state in orthogonal direction; 2) manipulating image features in both spatial and frequency domains to leverage local and global priors. Additionally, we develop a data synthesis pipeline, which applies Continue Dynamic Shooting Model (CDSM) to simulate realistic degradation in LAP images. Both the proposed JARNet and LAP image synthesis pipeline establish a foundation for addressing this intricate challenge. Extensive experiments demonstrate that the proposed two-stage method outperforms state-of-the-art image restoration models. Code is available at https://github.com/JHW2000/JARNet.
CVMay 10, 2023Code
Optical Aberration Correction in Postprocessing using Imaging SimulationShiqi Chen, Huajun Feng, Dexin Pan et al.
As the popularity of mobile photography continues to grow, considerable effort is being invested in the reconstruction of degraded images. Due to the spatial variation in optical aberrations, which cannot be avoided during the lens design process, recent commercial cameras have shifted some of these correction tasks from optical design to postprocessing systems. However, without engaging with the optical parameters, these systems only achieve limited correction for aberrations.In this work, we propose a practical method for recovering the degradation caused by optical aberrations. Specifically, we establish an imaging simulation system based on our proposed optical point spread function model. Given the optical parameters of the camera, it generates the imaging results of these specific devices. To perform the restoration, we design a spatial-adaptive network model on synthetic data pairs generated by the imaging simulation system, eliminating the overhead of capturing training data by a large amount of shooting and registration. Moreover, we comprehensively evaluate the proposed method in simulations and experimentally with a customized digital-single-lens-reflex (DSLR) camera lens and HUAWEI HONOR 20, respectively. The experiments demonstrate that our solution successfully removes spatially variant blur and color dispersion. When compared with the state-of-the-art deblur methods, the proposed approach achieves better results with a lower computational overhead. Moreover, the reconstruction technique does not introduce artificial texture and is convenient to transfer to current commercial cameras. Project Page: \url{https://github.com/TanGeeGo/ImagingSimulation}.
CVDec 9, 2025
Trajectory Densification and Depth from Perspective-based BlurTianchen Qiu, Qirun Zhang, Jiajian He et al.
In the absence of a mechanical stabilizer, the camera undergoes inevitable rotational dynamics during capturing, which induces perspective-based blur especially under long-exposure scenarios. From an optical standpoint, perspective-based blur is depth-position-dependent: objects residing at distinct spatial locations incur different blur levels even under the same imaging settings. Inspired by this, we propose a novel method that estimate metric depth by examining the blur pattern of a video stream and dense trajectory via joint optical design algorithm. Specifically, we employ off-the-shelf vision encoder and point tracker to extract video information. Then, we estimate depth map via windowed embedding and multi-window aggregation, and densify the sparse trajectory from the optical algorithm using a vision-language model. Evaluations on multiple depth datasets demonstrate that our method attains strong performance over large depth range, while maintaining favorable generalization. Relative to the real trajectory in handheld shooting settings, our optical algorithm achieves superior precision and the dense reconstruction maintains strong accuracy.
DBJan 18, 2025
Reliable Text-to-SQL with Adaptive AbstentionKaiwen Chen, Yueting Chen, Xiaohui Yu et al.
Large language models (LLMs) have revolutionized natural language interfaces for databases, particularly in text-to-SQL conversion. However, current approaches often generate unreliable outputs when faced with ambiguity or insufficient context. We present Reliable Text-to-SQL (RTS), a novel framework that enhances query generation reliability by incorporating abstention and human-in-the-loop mechanisms. RTS focuses on the critical schema linking phase, which aims to identify the key database elements needed for generating SQL queries. It autonomously detects potential errors during the answer generation process and responds by either abstaining or engaging in user interaction. A vital component of RTS is the Branching Point Prediction (BPP) which utilizes statistical conformal techniques on the hidden layers of the LLM model for schema linking, providing probabilistic guarantees on schema linking accuracy. We validate our approach through comprehensive experiments on the BIRD benchmark, demonstrating significant improvements in robustness and reliability. Our findings highlight the potential of combining transparent-box LLMs with human-in-the-loop processes to create more robust natural language interfaces for databases. For the BIRD benchmark, our approach achieves near-perfect schema linking accuracy, autonomously involving a human when needed. Combined with query generation, we demonstrate that near-perfect schema linking and a small query generation model can almost match SOTA accuracy achieved with a model orders of magnitude larger than the one we use.
IVJul 7, 2025
A Deep Unfolding Framework for Diffractive Snapshot Spectral ImagingZhengyue Zhuge, Jiahui Xu, Shiqi Chen et al.
Snapshot hyperspectral imaging systems acquire spectral data cubes through compressed sensing. Recently, diffractive snapshot spectral imaging (DSSI) methods have attracted significant attention. While various optical designs and improvements continue to emerge, research on reconstruction algorithms remains limited. Although numerous networks and deep unfolding methods have been applied on similar tasks, they are not fully compatible with DSSI systems because of their distinct optical encoding mechanism. In this paper, we propose an efficient deep unfolding framework for diffractive systems, termed diffractive deep unfolding (DDU). Specifically, we derive an analytical solution for the data fidelity term in DSSI, ensuring both the efficiency and the effectiveness during the iterative reconstruction process. Given the severely ill-posed nature of the problem, we employ a network-based initialization strategy rather than non-learning-based methods or linear layers, leading to enhanced stability and performance. Our framework demonstrates strong compatibility with existing state-of-the-art (SOTA) models, which effectively address the initialization and prior subproblem. Extensive experiments validate the superiority of the proposed DDU framework, showcasing improved performance while maintaining comparable parameter counts and computational complexity. These results suggest that DDU provides a solid foundation for future unfolding-based methods in DSSI.
CVJun 12, 2024
From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene OptimizationZiran Zhang, Yongrui Ma, Yueting Chen et al.
Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably trailing artifacts and signal latency, which hinder their direct applicability and generalization. Addressing these issues, we propose a novel per-scene optimization strategy tailored for low-light conditions. This approach utilizes the internal statistics of a sequence to handle degraded event data under low-light conditions, improving the generalizability to different lighting and camera settings. To evaluate its robustness in low-light condition, we further introduce EVFI-LL, a unique RGB+Event dataset captured under low-light conditions. Our results demonstrate state-of-the-art performance in low-light environments. Project page: https://naturezhanghn.github.io/sim2real.
CVMay 10, 2023
Mobile Image Restoration via Prior QuantizationShiqi Chen, Jinwen Zhou, Menghao Li et al.
In digital images, the performance of optical aberration is a multivariate degradation, where the spectral of the scene, the lens imperfections, and the field of view together contribute to the results. Besides eliminating it at the hardware level, the post-processing system, which utilizes various prior information, is significant for correction. However, due to the content differences among priors, the pipeline that aligns these factors shows limited efficiency and unoptimized restoration. Here, we propose a prior quantization model to correct the optical aberrations in image processing systems. To integrate these messages, we encode various priors into a latent space and quantify them by the learnable codebooks. After quantization, the prior codes are fused with the image restoration branch to realize targeted optical aberration correction. Comprehensive experiments demonstrate the flexibility of the proposed method and validate its potential to accomplish targeted restoration for a specific camera. Furthermore, our model promises to analyze the correlation between the various priors and the optical aberration of devices, which is helpful for joint soft-hardware design.
CVMay 10, 2023
Computational Optics for Mobile Terminals in Mass ProductionShiqi Chen, Ting Lin, Huajun Feng et al.
Correcting the optical aberrations and the manufacturing deviations of cameras is a challenging task. Due to the limitation on volume and the demand for mass production, existing mobile terminals cannot rectify optical degradation. In this work, we systematically construct the perturbed lens system model to illustrate the relationship between the deviated system parameters and the spatial frequency response measured from photographs. To further address this issue, an optimization framework is proposed based on this model to build proxy cameras from the machining samples' SFRs. Engaging with the proxy cameras, we synthetic data pairs, which encode the optical aberrations and the random manufacturing biases, for training the learning-based algorithms. In correcting aberration, although promising results have been shown recently with convolutional neural networks, they are hard to generalize to stochastic machining biases. Therefore, we propose a dilated Omni-dimensional dynamic convolution and implement it in post-processing to account for the manufacturing degradation. Extensive experiments which evaluate multiple samples of two representative devices demonstrate that the proposed optimization framework accurately constructs the proxy camera. And the dynamic processing model is well-adapted to manufacturing deviations of different cameras, realizing perfect computational photography. The evaluation shows that the proposed method bridges the gap between optical design, system machining, and post-processing pipeline, shedding light on the joint of image signal reception (lens and sensor) and image signal processing.
CVApr 30, 2021
SRDiff: Single Image Super-Resolution with Diffusion Probabilistic ModelsHaoying Li, Yifan Yang, Meng Chang et al.
Single image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from the given low-resolution (LR) ones, which is an ill-posed problem because one LR image corresponds to multiple HR images. Recently, learning-based SISR methods have greatly outperformed traditional ones, while suffering from over-smoothing, mode collapse or large model footprint issues for PSNR-oriented, GAN-driven and flow-based methods respectively. To solve these problems, we propose a novel single image super-resolution diffusion probabilistic model (SRDiff), which is the first diffusion-based model for SISR. SRDiff is optimized with a variant of the variational bound on the data likelihood and can provide diverse and realistic SR predictions by gradually transforming the Gaussian noise into a super-resolution (SR) image conditioned on an LR input through a Markov chain. In addition, we introduce residual prediction to the whole framework to speed up convergence. Our extensive experiments on facial and general benchmarks (CelebA and DIV2K datasets) show that 1) SRDiff can generate diverse SR results in rich details with state-of-the-art performance, given only one LR input; 2) SRDiff is easy to train with a small footprint; and 3) SRDiff can perform flexible image manipulation including latent space interpolation and content fusion.
DBMar 2, 2020
Evaluating Temporal Queries Over Video FeedsYueting Chen, Xiaohui Yu, Nick Koudas
Recent advances in Computer Vision and Deep Learning made possible the efficient extraction of a schema from frames of streaming video. As such, a stream of objects and their associated classes along with unique object identifiers derived via object tracking can be generated, providing unique objects as they are captured across frames. In this paper we initiate a study of temporal queries involving objects and their co-occurrences in video feeds. For example, queries that identify video segments during which the same two red cars and the same two humans appear jointly for five minutes are of interest to many applications ranging from law enforcement to security and safety. We take the first step and define such queries in a way that they incorporate certain physical aspects of video capture such as object occlusion. We present an architecture consisting of three layers, namely object detection/tracking, intermediate data generation and query evaluation. We propose two techniques,MFS and SSG, to organize all detected objects in the intermediate data generation layer, which effectively, given the queries, minimizes the number of objects and frames that have to be considered during query evaluation. We also introduce an algorithm called State Traversal (ST) that processes incoming frames against the SSG and efficiently prunes objects and frames unrelated to query evaluation, while maintaining all states required for succinct query evaluation. We present the results of a thorough experimental evaluation utilizing both real and synthetic data establishing the trade-offs between MFS and SSG. We stress various parameters of interest in our evaluation and demonstrate that the proposed query evaluation methodology coupled with the proposed algorithms is capable to evaluate temporal queries over video feeds efficiently, achieving orders of magnitude performance benefits.