Stanley H. Chan

h-index29

53papers

2,816citations

Novelty52%

AI Score61

Ranked #2,243 of 194,257 authors (top 1%)#877 in CV (top 1%)

53 Papers

26.2IVJul 20, 2023Code

Physics-Driven Turbulence Image Restoration with Stochastic Refinement

Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan et al.

Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.

16.4CVJun 29, 2023

FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude

Feng Liu, Ryan Ashbaugh, Nicholas Chimitt et al. · gatech

Whole-body biometric recognition is an important area of research due to its vast applications in law enforcement, border security, and surveillance. This paper presents the end-to-end design, development and evaluation of FarSight, an innovative software system designed for whole-body (fusion of face, gait and body shape) biometric recognition. FarSight accepts videos from elevated platforms and drones as input and outputs a candidate list of identities from a gallery. The system is designed to address several challenges, including (i) low-quality imagery, (ii) large yaw and pitch angles, (iii) robust feature extraction to accommodate large intra-person variabilities and large inter-person similarities, and (iv) the large domain gap between training and test sets. FarSight combines the physics of imaging and deep learning models to enhance image restoration and biometric feature encoding. We test FarSight's effectiveness using the newly acquired IARPA Biometric Recognition and Identification at Altitude and Range (BRIAR) dataset. Notably, FarSight demonstrated a substantial performance increase on the BRIAR dataset, with gains of +11.82% Rank-20 identification and +11.3% TAR@1% FAR.

26.7IVJul 20, 2022Code

Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model

Zhiyuan Mao, Ajay Jaiswal, Zhangyang Wang et al.

Image restoration algorithms for atmospheric turbulence are known to be much more challenging to design than traditional ones such as blur or noise because the distortion caused by the turbulence is an entanglement of spatially varying blur, geometric distortion, and sensor noise. Existing CNN-based restoration methods built upon convolutional kernels with static weights are insufficient to handle the spatially dynamical atmospheric turbulence effect. To address this problem, in this paper, we propose a physics-inspired transformer model for imaging through atmospheric turbulence. The proposed network utilizes the power of transformer blocks to jointly extract a dynamical turbulence distortion map and restore a turbulence-free image. In addition, recognizing the lack of a comprehensive dataset, we collect and present two new real-world turbulence datasets that allow for evaluation with both classical objective metrics (e.g., PSNR and SSIM) and a new task-driven metric using text recognition accuracy. Both real testing sets and all related code will be made publicly available.

24.2IVJul 13, 2022

Imaging through the Atmosphere using Turbulence Mitigation Transformer

Xingguang Zhang, Zhiyuan Mao, Nicholas Chimitt et al.

Restoring images distorted by atmospheric turbulence is a ubiquitous problem in long-range imaging applications. While existing deep-learning-based methods have demonstrated promising results in specific testing conditions, they suffer from three limitations: (1) lack of generalization capability from synthetic training data to real turbulence data; (2) failure to scale, hence causing memory and speed challenges when extending the idea to a large number of frames; (3) lack of a fast and accurate simulator to generate data for training neural networks. In this paper, we introduce the turbulence mitigation transformer (TMT) that explicitly addresses these issues. TMT brings three contributions: Firstly, TMT explicitly uses turbulence physics by decoupling the turbulence degradation and introducing a multi-scale loss for removing distortion, thus improving effectiveness. Secondly, TMT presents a new attention module along the temporal axis to extract extra features efficiently, thus improving memory and speed. Thirdly, TMT introduces a new simulator based on the Fourier sampler, temporal correlation, and flexible kernel size, thus improving our capability to synthesize better training data. TMT outperforms state-of-the-art video restoration models, especially in generalizing from synthetic to real turbulence data. Code, videos, and datasets are available at \href{https://xg416.github.io/TMT}{https://xg416.github.io/TMT}.

13.8IVMar 30, 2023

HDR Imaging with Spatially Varying Signal-to-Noise Ratios

Yiheng Chi, Xingguang Zhang, Stanley H. Chan

While today's high dynamic range (HDR) image fusion algorithms are capable of blending multiple exposures, the acquisition is often controlled so that the dynamic range within one exposure is narrow. For HDR imaging in photon-limited situations, the dynamic range can be enormous and the noise within one exposure is spatially varying. Existing image denoising algorithms and HDR fusion algorithms both fail to handle this situation, leading to severe limitations in low-light HDR imaging. This paper presents two contributions. Firstly, we identify the source of the problem. We find that the issue is associated with the co-existence of (1) spatially varying signal-to-noise ratio, especially the excessive noise due to very dark regions, and (2) a wide luminance range within each exposure. We show that while the issue can be handled by a bank of denoisers, the complexity is high. Secondly, we propose a new method called the spatially varying high dynamic range (SV-HDR) fusion network to simultaneously denoise and fuse images. We introduce a new exposure-shared block within our custom-designed multi-scale transformer framework. In a variety of testing conditions, the performance of the proposed SV-HDR is better than the existing methods.

12.8IVJun 30, 2023

Spatially Varying Exposure with 2-by-2 Multiplexing: Optimality and Universality

Xiangyu Qu, Yiheng Chi, Stanley H. Chan

The advancement of new digital image sensors has enabled the design of exposure multiplexing schemes where a single image capture can have multiple exposures and conversion gains in an interlaced format, similar to that of a Bayer color filter array. In this paper, we ask the question of how to design such multiplexing schemes for adaptive high-dynamic range (HDR) imaging where the multiplexing scheme can be updated according to the scenes. We present two new findings. (i) We address the problem of design optimality. We show that given a multiplex pattern, the conventional optimality criteria based on the input/output-referred signal-to-noise ratio (SNR) of the independently measured pixels can lead to flawed decisions because it cannot encapsulate the location of the saturated pixels. We overcome the issue by proposing a new concept known as the spatially varying exposure risk (SVE-Risk) which is a pseudo-idealistic quantification of the amount of recoverable pixels. We present an efficient enumeration algorithm to select the optimal multiplex patterns. (ii) We report a design universality observation that the design of the multiplex pattern can be decoupled from the image reconstruction algorithm. This is a significant departure from the recent literature that the multiplex pattern should be jointly optimized with the reconstruction algorithm. Our finding suggests that in the context of exposure multiplexing, an end-to-end training may not be necessary.

6.8CVMar 29, 2023

Towards Understanding the Effect of Pretraining Label Granularity

Guan Zhe Hong, Yin Cui, Ariel Fuxman et al.

In this paper, we study how the granularity of pretraining labels affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem. Empirically, we show that pretraining on the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining on other coarser granularity levels, which supports the common practice used in the community. Theoretically, we explain the benefit of fine-grained pretraining by proving that, for a data distribution satisfying certain hierarchy conditions, 1) coarse-grained pretraining only allows a neural network to learn the "common" or "easy-to-learn" features well, while 2) fine-grained pretraining helps the network learn the "rarer" or "fine-grained" features in addition to the common ones, thus improving its accuracy on hard downstream test samples in which common features are missing or weak in strength. Furthermore, we perform comprehensive experiments using the label hierarchies of iNaturalist 2021 and observe that the following conditions, in addition to proper choice of label granularity, enable the transfer to work well in practice: 1) the pretraining dataset needs to have a meaningful label hierarchy, and 2) the pretraining and target label functions need to align well.

9.1CVMay 30

OptiWorld: Optimal Control for Video World Generation under Physical Constraints

Yu Yuan, Jianhao Yuan, Xijun Wang et al.

Video generation models are becoming a scalable form of world models, but they mainly generate plausible motion rather than proactively control or optimize the underlying dynamics. As a result, an object in the generated video may follow trajectories that are unsafe, not smooth, inefficient, or physically inconsistent. In this work, we propose \textbf{OptiWorld}, a framework that brings classical optimal control into video generation at inference time. OptiWorld first extracts a compact, task-relevant world state, then plans an optimal trajectory under physical constraints, and finally renders the video conditioned on this trajectory. We formulate planning as a geometric problem on a continuous manifold, which converts 3D geometry and task-dependent physical constraints into a unified planning geometry. By adding this optimal-control layer, OptiWorld generates videos with preferable dynamics, demonstrating strong potential in multiple tasks including goal-conditioned image-to-video generation, video dynamics editing, and counterfactual generation.

6.8IVMar 31

Pupil Design for Computational Wavefront Estimation

Ali Almuallem, Nicholas Chimitt, Bole Ma et al.

Establishing a precise connection between imaged intensity and the incident wavefront is essential for emerging applications in adaptive optics, holography, computational microscopy, and non-line-of-sight imaging. While prior work has shown that breaking symmetries in pupil design enables wavefront recovery from a single intensity measurement, there is little guidance on how to design a pupil that improves wavefront estimation. In this work we introduce a quantitative asymmetry metric to bridge this gap and, through an extensive empirical study and supporting analysis, demonstrate that increasing asymmetry enhances wavefront recoverability. We analyze the trade-offs in pupil design, and the impact on light throughput along with performance in noise. Both large-scale simulations and optical bench experiments are carried out to support our findings.

24.9IVJan 8, 2024Code

Spatio-Temporal Turbulence Mitigation: A Translational Perspective

Xingguang Zhang, Nicholas Chimitt, Yiheng Chi et al.

Recovering images distorted by atmospheric turbulence is a challenging inverse problem due to the stochastic nature of turbulence. Although numerous turbulence mitigation (TM) algorithms have been proposed, their efficiency and generalization to real-world dynamic scenarios remain severely limited. Building upon the intuitions of classical TM algorithms, we present the Deep Atmospheric TUrbulence Mitigation network (DATUM). DATUM aims to overcome major challenges when transitioning from classical to deep learning approaches. By carefully integrating the merits of classical multi-frame TM methods into a deep network structure, we demonstrate that DATUM can efficiently perform long-range temporal aggregation using a recurrent fashion, while deformable attention and temporal-channel attention seamlessly facilitate pixel registration and lucky imaging. With additional supervision, tilt and blur degradation can be jointly mitigated. These inductive biases empower DATUM to significantly outperform existing methods while delivering a tenfold increase in processing speed. A large-scale training dataset, ATSyn, is presented as a co-invention to enable generalization in real turbulence. Our code and datasets are available at https://xg416.github.io/DATUM.

17.8CVDec 3, 2024Code

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Yu Yuan, Xijun Wang, Yichen Sheng et al.

Image generation today can produce somewhat realistic images from text prompts. However, if one asks the generator to synthesize a specific camera setting such as creating different fields of view using a 24mm lens versus a 70mm lens, the generator will not be able to interpret and generate scene-consistent images. This limitation not only hinders the adoption of generative tools in professional photography but also highlights the broader challenge of aligning data-driven models with real-world physical settings. In this paper, we introduce Generative Photography, a framework that allows controlling camera intrinsic settings during content generation. The core innovation of this work are the concepts of Dimensionality Lifting and Differential Camera Intrinsics Learning, enabling smooth and consistent transitions across different camera settings. Experimental results show that our method produces significantly more scene-consistent photorealistic images than state-of-the-art models such as Stable Diffusion 3 and FLUX. Our code and additional results are available at https://generative-photography.github.io/project.

11.9IVOct 19, 2024Code

Quanta Video Restoration

Prateek Chennuri, Yiheng Chi, Enze Jiang et al.

The proliferation of single-photon image sensors has opened the door to a plethora of high-speed and low-light imaging applications. However, data collected by these sensors are often 1-bit or few-bit, and corrupted by noise and strong motion. Conventional video restoration methods are not designed to handle this situation, while specialized quanta burst algorithms have limited performance when the number of input frames is low. In this paper, we introduce Quanta Video Restoration (QUIVER), an end-to-end trainable network built on the core ideas of classical quanta restoration methods, i.e., pre-filtering, flow estimation, fusion, and refinement. We also collect and publish I2-2000FPS, a high-speed video dataset with the highest temporal resolution of 2000 frames-per-second, for training and testing. On simulated and real data, QUIVER outperforms existing quanta restoration methods by a significant margin. Code and dataset available at https://github.com/chennuriprateek/Quanta_Video_Restoration-QUIVER-

6.2CVDec 3, 2025

SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation

Yu Yuan, Tharindu Wickremasinghe, Zeeshan Nadir et al.

Images and videos are discrete 2D projections of the 4D world (3D space + time). Most visual understanding, prediction, and generation operate directly on 2D observations, leading to suboptimal performance. We propose SeeU, a novel approach that learns the continuous 4D dynamics and generate the unseen visual contents. The principle behind SeeU is a new 2D$\to$4D$\to$2D learning framework. SeeU first reconstructs the 4D world from sparse and monocular 2D frames (2D$\to$4D). It then learns the continuous 4D dynamics on a low-rank representation and physical constraints (discrete 4D$\to$continuous 4D). Finally, SeeU rolls the world forward in time, re-projects it back to 2D at sampled times and viewpoints, and generates unseen regions based on spatial-temporal context awareness (4D$\to$2D). By modeling dynamics in 4D, SeeU achieves continuous and physically-consistent novel visual generation, demonstrating strong potentials in multiple tasks including unseen temporal generation, unseen spatial generation, and video editing.

6.2CVJun 3, 2025Code

Astrophotography turbulence mitigation via generative models

Joonyeoup Kim, Yu Yuan, Xingguang Zhang et al.

Photography is the cornerstone of modern astronomical and space research. However, most astronomical images captured by ground-based telescopes suffer from atmospheric turbulence, resulting in degraded imaging quality. While multi-frame strategies like lucky imaging can mitigate some effects, they involve intensive data acquisition and complex manual processing. In this paper, we propose AstroDiff, a generative restoration method that leverages both the high-quality generative priors and restoration capabilities of diffusion models to mitigate atmospheric turbulence. Extensive experiments demonstrate that AstroDiff outperforms existing state-of-the-art learning-based methods in astronomical image turbulence mitigation, providing higher perceptual quality and better structural fidelity under severe turbulence conditions. Our code and additional results are available at https://web-six-kappa-66.vercel.app/

1.4LGJan 1

Practical Geometric and Quantum Kernel Methods for Predicting Skeletal Muscle Outcomes in chronic obstructive pulmonary disease

Azadeh Alavi, Hamidreza Khalili, Stanley H. Chan et al.

Skeletal muscle dysfunction is a clinically relevant extra-pulmonary manifestation of chronic obstructive pulmonary disease (COPD) and is closely linked to systemic and airway inflammation. This motivates predictive modelling of muscle outcomes from minimally invasive biomarkers that can be acquired longitudinally. We study a small-sample preclinical dataset comprising 213 animals across two conditions (Sham versus cigarette-smoke exposure), with blood and bronchoalveolar lavage fluid measurements and three continuous targets: tibialis anterior muscle weight (milligram: mg), specific force (millinewton: mN), and a derived muscle quality index (mN per mg). We benchmark tuned classical baselines, geometry-aware symmetric positive definite (SPD) descriptors with Stein divergence, and quantum kernel models designed for low-dimensional tabular data. In the muscle-weight setting, quantum kernel ridge regression using four interpretable inputs (blood C-reactive protein, neutrophil count, bronchoalveolar lavage cellularity, and condition) attains a test root mean squared error of 4.41 mg and coefficient of determination of 0.605, improving over a matched ridge baseline on the same feature set (4.70 mg and 0.553). Geometry-informed Stein-divergence prototype distances yield a smaller but consistent gain in the biomarker-only setting (4.55 mg versus 4.79 mg). Screening-style evaluation, obtained by thresholding the continuous outcome at 0.8 times the training Sham mean, achieves an area under the receiver operating characteristic curve (ROC-AUC) of up to 0.90 for detecting low muscle weight. These results indicate that geometric and quantum kernel lifts can provide measurable benefits in low-data, low-feature biomedical prediction problems, while preserving interpretability and transparent model selection.

2.3SPJul 29, 2024

Analysis and Improvement of Rank-Ordered Mean Algorithm in Single-Photon LiDAR

William C. Yau, Weijian Zhang, Hashan Kavinga Weerasooriya et al.

Depth estimation using a single-photon LiDAR is often solved by a matched filter. It is, however, error-prone in the presence of background noise. A commonly used technique to reject background noise is the rank-ordered mean (ROM) filter previously reported by Shin \textit{et al.} (2015). ROM rejects noisy photon arrival timestamps by selecting only a small range of them around the median statistics within its local neighborhood. Despite the promising performance of ROM, its theoretical performance limit is unknown. In this paper, we theoretically characterize the ROM performance by showing that ROM fails when the reflectivity drops below a threshold predetermined by the depth and signal-to-background ratio, and its accuracy undergoes a phase transition at the cutoff. Based on our theory, we propose an improved signal extraction technique by selecting tight timestamp clusters. Experimental results show that the proposed algorithm improves depth estimation performance over ROM by 3 orders of magnitude at the same signal intensities, and achieves high image fidelity at noise levels as high as 17 times that of signal.

29.9LGMar 26, 2024

Tutorial on Diffusion Models for Imaging and Vision

Stanley H. Chan

The astonishing growth of generative tools in recent years has empowered many exciting applications in text-to-image generation and text-to-video generation. The underlying principle behind these generative tools is the concept of diffusion, a particular sampling mechanism that has overcome some shortcomings that were deemed difficult in the previous approaches. The goal of this tutorial is to discuss the essential ideas underlying the diffusion models. The target audience of this tutorial includes undergraduate and graduate students who are interested in doing research on diffusion models or applying these models to solve other problems.

1.5CVSep 28, 2023

Space-Time Attention with Shifted Non-Local Search

Kent Gauen, Stanley Chan

Efficiently computing attention maps for videos is challenging due to the motion of objects between frames. While a standard non-local search is high-quality for a window surrounding each query point, the window's small size cannot accommodate motion. Methods for long-range motion use an auxiliary network to predict the most similar key coordinates as offsets from each query location. However, accurately predicting this flow field of offsets remains challenging, even for large-scale networks. Small spatial inaccuracies significantly impact the attention module's quality. This paper proposes a search strategy that combines the quality of a non-local search with the range of predicted offsets. The method, named Shifted Non-Local Search, executes a small grid search surrounding the predicted offsets to correct small spatial errors. Our method's in-place computation consumes 10 times less memory and is over 3 times faster than previous work. Experimentally, correcting the small spatial errors improves the video frame alignment quality by over 3 dB PSNR. Our search upgrades existing space-time attention modules, which improves video denoising results by 0.30 dB PSNR for a 7.5% increase in overall runtime. We integrate our space-time attention module into a UNet-like architecture to achieve state-of-the-art results on video denoising.

4.3SPMar 25, 2024

Resolution Limit of Single-Photon LiDAR

Stanley H. Chan, Hashan K. Weerasooriya, Weijian Zhang et al.

Single-photon Light Detection and Ranging (LiDAR) systems are often equipped with an array of detectors for improved spatial resolution and sensing speed. However, given a fixed amount of flux produced by the laser transmitter across the scene, the per-pixel Signal-to-Noise Ratio (SNR) will decrease when more pixels are packed in a unit space. This presents a fundamental trade-off between the spatial resolution of the sensor array and the SNR received at each pixel. Theoretical characterization of this fundamental limit is explored. By deriving the photon arrival statistics and introducing a series of new approximation techniques, the Mean Squared Error (MSE) of the maximum-likelihood estimator of the time delay is derived. The theoretical predictions align well with simulations and real data.

3.6CVMay 19, 2025

Joint Depth and Reflectivity Estimation using Single-Photon LiDAR

Hashan K. Weerasooriya, Prateek Chennuri, Weijian Zhang et al.

Single-Photon Light Detection and Ranging (SP-LiDAR is emerging as a leading technology for long-range, high-precision 3D vision tasks. In SP-LiDAR, timestamps encode two complementary pieces of information: pulse travel time (depth) and the number of photons reflected by the object (reflectivity). Existing SP-LiDAR reconstruction methods typically recover depth and reflectivity separately or sequentially use one modality to estimate the other. Moreover, the conventional 3D histogram construction is effective mainly for slow-moving or stationary scenes. In dynamic scenes, however, it is more efficient and effective to directly process the timestamps. In this paper, we introduce an estimation method to simultaneously recover both depth and reflectivity in fast-moving scenes. We offer two contributions: (1) A theoretical analysis demonstrating the mutual correlation between depth and reflectivity and the conditions under which joint estimation becomes beneficial. (2) A novel reconstruction method, "SPLiDER", which exploits the shared information to enhance signal recovery. On both synthetic and real SP-LiDAR data, our method outperforms existing approaches, achieving superior joint reconstruction quality.

3.7CVDec 18, 2024

Personalized Generative Low-light Image Denoising and Enhancement

Xijun Wang, Prateek Chennuri, Yu Yuan et al.

While smartphone cameras today can produce astonishingly good photos, their performance in low light is still not completely satisfactory because of the fundamental limits in photon shot noise and sensor read noise. Generative image restoration methods have demonstrated promising results compared to traditional methods, but they suffer from hallucinatory content generation when the signal-to-noise ratio (SNR) is low. Recognizing the availability of personalized photo galleries on users' smartphones, we propose Personalized Generative Denoising (PGD) by building a diffusion model customized for different users. Our core innovation is an identity-consistent physical buffer that extracts the physical attributes of the person from the gallery. This ID-consistent physical buffer provides a strong prior that can be integrated with the diffusion model to restore the degraded images, without the need of fine-tuning. Over a wide range of low-light testing scenarios, we show that PGD achieves superior image denoising and enhancement performance compared to existing diffusion-based denoising approaches.

19.0CVMay 7, 2025

Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait

Feng Liu, Nicholas Chimitt, Lanqing Guo et al. · gatech

We address the problem of whole-body person recognition in unconstrained environments. This problem arises in surveillance scenarios such as those in the IARPA Biometric Recognition and Identification at Altitude and Range (BRIAR) program, where biometric data is captured at long standoff distances, elevated viewing angles, and under adverse atmospheric conditions (e.g., turbulence and high wind velocity). To this end, we propose FarSight, a unified end-to-end system for person recognition that integrates complementary biometric cues across face, gait, and body shape modalities. FarSight incorporates novel algorithms across four core modules: multi-subject detection and tracking, recognition-aware video restoration, modality-specific biometric feature encoding, and quality-guided multi-modal fusion. These components are designed to work cohesively under degraded image conditions, large pose and scale variations, and cross-domain gaps. Extensive experiments on the BRIAR dataset, one of the most comprehensive benchmarks for long-range, multi-modal biometric recognition, demonstrate the effectiveness of FarSight. Compared to our preliminary system, this system achieves a 34.1% absolute gain in 1:1 verification accuracy (TAR@0.1% FAR), a 17.8% increase in closed-set identification (Rank-20), and a 34.3% reduction in open-set identification errors (FNIR@1% FPIR). Furthermore, FarSight was evaluated in the 2025 NIST RTE Face in Video Evaluation (FIVE), which conducts standardized face recognition testing on the BRIAR dataset. These results establish FarSight as a state-of-the-art solution for operational biometric recognition in challenging real-world conditions.

3.7CVMar 28, 2024

Generative Quanta Color Imaging

Vishal Purohit, Junjie Luo, Yiheng Chi et al.

The astonishing development of single-photon cameras has created an unprecedented opportunity for scientific and industrial imaging. However, the high data throughput generated by these 1-bit sensors creates a significant bottleneck for low-power applications. In this paper, we explore the possibility of generating a color image from a single binary frame of a single-photon camera. We evidently find this problem being particularly difficult to standard colorization approaches due to the substantial degree of exposure variation. The core innovation of our paper is an exposure synthesis model framed under a neural ordinary differential equation (Neural ODE) that allows us to generate a continuum of exposures from a single observation. This innovation ensures consistent exposure in binary images that colorizers take on, resulting in notably enhanced colorization. We demonstrate applications of the method in single-image and burst colorization and show superior generative performance over baselines. Project website can be found at https://vishal-s-p.github.io/projects/2023/generative_quanta_color.html.

16.4CVApr 3, 2025Code

Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation

Xingguang Zhang, Nicholas Chimitt, Xijun Wang et al.

Atmospheric turbulence is a major source of image degradation in long-range imaging systems. Although numerous deep learning-based turbulence mitigation (TM) methods have been proposed, many are slow, memory-hungry, and do not generalize well. In the spatial domain, methods based on convolutional operators have a limited receptive field, so they cannot handle a large spatial dependency required by turbulence. In the temporal domain, methods relying on self-attention can, in theory, leverage the lucky effects of turbulence, but their quadratic complexity makes it difficult to scale to many frames. Traditional recurrent aggregation methods face parallelization challenges. In this paper, we present a new TM method based on two concepts: (1) A turbulence mitigation network based on the Selective State Space Model (MambaTM). MambaTM provides a global receptive field in each layer across spatial and temporal dimensions while maintaining linear computational complexity. (2) Learned Latent Phase Distortion (LPD). LPD guides the state space model. Unlike classical Zernike-based representations of phase distortion, the new LPD map uniquely captures the actual effects of turbulence, significantly improving the model's capability to estimate degradation by reducing the ill-posedness. Our proposed method exceeds current state-of-the-art networks on various synthetic and real-world TM benchmarks with significantly faster inference speed.

27.2CVSep 25, 2025

NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics

Yu Yuan, Xijun Wang, Tharindu Wickremasinghe et al.

A primary bottleneck in large-scale text-to-video generation today is physical consistency and controllability. Despite recent advances, state-of-the-art models often produce unrealistic motions, such as objects falling upward, or abrupt changes in velocity and direction. Moreover, these models lack precise parameter control, struggling to generate physically consistent dynamics under different initial conditions. We argue that this fundamental limitation stems from current models learning motion distributions solely from appearance, while lacking an understanding of the underlying dynamics. In this work, we propose NewtonGen, a framework that integrates data-driven synthesis with learnable physical principles. At its core lies trainable Neural Newtonian Dynamics (NND), which can model and predict a variety of Newtonian motions, thereby injecting latent dynamical constraints into the video generation process. By jointly leveraging data priors and dynamical guidance, NewtonGen enables physically consistent video synthesis with precise parameter control.

2.6CVNov 9, 2021

Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement

Xue Zhang, Gene Cheung, Jiahao Pang et al.

A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements directly on the sensed images \textit{a priori}, before synthesizing a 3D point cloud. By enhancing near the physical sensing process, we tailor our optimization to our depth formation model before subsequent processing steps that obscure measurement errors. Specifically, we model depth formation as a combined process of signal-dependent noise addition and non-uniform log-based quantization. The designed model is validated (with parameters fitted) using collected empirical data from a representative depth sensor. To enhance each pixel row in a depth image, we first encode intra-view similarities between available row pixels as edge weights via feature graph learning. We next establish inter-view similarities with another rectified depth image via viewpoint mapping and sparse linear interpolation. This leads to a maximum a posteriori (MAP) graph filtering objective that is convex and differentiable. We minimize the objective efficiently using accelerated gradient descent (AGD), where the optimal step size is approximated via Gershgorin circle theorem (GCT). Experiments show that our method significantly outperformed recent point cloud denoising schemes and state-of-the-art image denoising schemes in two established point cloud quality metrics.

1.4CVAug 20, 2021

Detecting and Segmenting Adversarial Graphics Patterns from Images

Xiangyu Qu, Stanley H. Chan

Adversarial attacks pose a substantial threat to computer vision system security, but the social media industry constantly faces another form of "adversarial attack" in which the hackers attempt to upload inappropriate images and fool the automated screening systems by adding artificial graphics patterns. In this paper, we formulate the defense against such attacks as an artificial graphics pattern segmentation problem. We evaluate the efficacy of several segmentation algorithms and, based on observation of their performance, propose a new method tailored to this specific problem. Extensive experiments show that the proposed method outperforms the baselines and has a promising generalization capability, which is the most crucial aspect in segmenting artificial graphics patterns.

24.0AIAug 13, 2021

Optical Adversarial Attack

Abhiram Gnanasambandam, Alex M. Sherman, Stanley H. Chan

We introduce OPtical ADversarial attack (OPAD). OPAD is an adversarial attack in the physical space aiming to fool image classifiers without physically touching the objects (e.g., moving or painting the objects). The principle of OPAD is to use structured illumination to alter the appearance of the target objects. The system consists of a low-cost projector, a camera, and a computer. The challenge of the problem is the non-linearity of the radiometric response of the projector and the spatially varying spectral response of the scene. Attacks generated in a conventional approach do not work in this setting unless they are calibrated to compensate for such a projector-camera model. The proposed solution incorporates the projector-camera model into the adversarial attack optimization, where a new attack formulation is derived. Experimental results prove the validity of the solution. It is demonstrated that OPAD can optically attack a real 3D object in the presence of background lighting for white-box, black-box, targeted, and untargeted attacks. Theoretical analysis is presented to quantify the fundamental performance limit of the system.

24.1IVJul 24, 2021

Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform

Zhiyuan Mao, Nicholas Chimitt, Stanley H. Chan

Fast and accurate simulation of imaging through atmospheric turbulence is essential for developing turbulence mitigation algorithms. Recognizing the limitations of previous approaches, we introduce a new concept known as the phase-to-space (P2S) transform to significantly speed up the simulation. P2S is build upon three ideas: (1) reformulating the spatially varying convolution as a set of invariant convolutions with basis functions, (2) learning the basis function via the known turbulence statistics models, (3) implementing the P2S transform via a light-weight network that directly convert the phase representation to spatial representation. The new simulator offers 300x -- 1000x speed up compared to the mainstream split-step simulators while preserving the essential turbulence statistics.

8.0SPJun 30, 2021

Graph Signal Restoration Using Nested Deep Algorithm Unrolling

Masatoshi Nagahama, Koki Yamada, Yuichi Tanaka et al.

Graph signal processing is a ubiquitous task in many applications such as sensor, social, transportation and brain networks, point cloud processing, and graph neural networks. Often, graph signals are corrupted in the sensing process, thus requiring restoration. In this paper, we propose two graph signal restoration methods based on deep algorithm unrolling (DAU). First, we present a graph signal denoiser by unrolling iterations of the alternating direction method of multiplier (ADMM). We then suggest a general restoration method for linear degradation by unrolling iterations of Plug-and-Play ADMM (PnP-ADMM). In the second approach, the unrolled ADMM-based denoiser is incorporated as a submodule, leading to a nested DAU structure. The parameters in the proposed denoising/restoration methods are trainable in an end-to-end manner. Our approach is interpretable and keeps the number of parameters small since we only tune graph-independent regularization parameters. We overcome two main challenges in existing graph signal restoration methods: 1) limited performance of convex optimization algorithms due to fixed parameters which are often determined manually. 2) large number of parameters of graph neural networks that result in difficulty of training. Several experiments for graph signal denoising and interpolation are performed on synthetic and real-world data. The proposed methods show performance improvements over several existing techniques in terms of root mean squared error in both tasks.

6.5CVJun 24, 2021

DROID: Driver-centric Risk Object Identification

Chengxi Li, Stanley H. Chan, Yi-Ting Chen

Identification of high-risk driving situations is generally approached through collision risk estimation or accident pattern recognition. In this work, we approach the problem from the perspective of subjective risk. We operationalize subjective risk assessment by predicting driver behavior changes and identifying the cause of changes. To this end, we introduce a new task called driver-centric risk object identification (DROID), which uses egocentric video to identify object(s) influencing a driver's behavior, given only the driver's response as the supervision signal. We formulate the task as a cause-effect problem and present a novel two-stage DROID framework, taking inspiration from models of situation awareness and causal inference. A subset of data constructed from the Honda Research Institute Driving Dataset (HDD) is used to evaluate DROID. We demonstrate state-of-the-art DROID performance, even compared with strong baseline models using this dataset. Additionally, we conduct extensive ablative studies to justify our design choices. Moreover, we demonstrate the applicability of DROID for risk assessment.

5.5LGMar 13, 2021

Student-Teacher Learning from Clean Inputs to Noisy Inputs

Guanzhe Hong, Zhiyuan Mao, Xiaojun Lin et al.

Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the student's input sample is corrupted by noise. However, there is a lack of theoretical insights into why and when this method of transferring knowledge can be successful between such heterogeneous tasks. We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks. We identify three vital factors to the success of the method: (1) whether the student is trained to zero training loss; (2) how knowledgeable the teacher is on the clean-input problem; (3) how the teacher decomposes its knowledge in its hidden features. Lack of proper control in any of the three factors leads to failure of the student-teacher learning method.

18.7IVAug 31, 2020

Image Reconstruction of Static and Dynamic Scenes through Anisoplanatic Turbulence

Zhiyuan Mao, Nicholas Chimitt, Stanley Chan

Ground based long-range passive imaging systems often suffer from degraded image quality due to a turbulent atmosphere. While methods exist for removing such turbulent distortions, many are limited to static sequences which cannot be extended to dynamic scenes. In addition, the physics of the turbulence is often not integrated into the image reconstruction algorithms, making the physics foundations of the methods weak. In this paper, we present a unified method for atmospheric turbulence mitigation in both static and dynamic sequences. We are able to achieve better results compared to existing methods by utilizing (i) a novel space-time non-local averaging method to construct a reliable reference frame, (ii) a geometric consistency and a sharpness metric to generate the lucky frame, (iii) a physics-constrained prior model of the point spread function for blind deconvolution. Experimental results based on synthetic and real long-range turbulence sequences validate the performance of the proposed method.

15.3IVJul 16, 2020

Dynamic Low-light Imaging with Quanta Image Sensors

Yiheng Chi, Abhiram Gnanasambandam, Vladlen Koltun et al.

Imaging in low light is difficult because the number of photons arriving at the sensor is low. Imaging dynamic scenes in low-light environments is even more difficult because as the scene moves, pixels in adjacent frames need to be aligned before they can be denoised. Conventional CMOS image sensors (CIS) are at a particular disadvantage in dynamic low-light settings because the exposure cannot be too short lest the read noise overwhelms the signal. We propose a solution using Quanta Image Sensors (QIS) and present a new image reconstruction algorithm. QIS are single-photon image sensors with photon counting capabilities. Studies over the past decade have confirmed the effectiveness of QIS for low-light imaging but reconstruction algorithms for dynamic scenes in low light remain an open problem. We fill the gap by proposing a student-teacher training protocol that transfers knowledge from a motion teacher and a denoising teacher to a student network. We show that dynamic scenes can be reconstructed from a burst of frames at a photon level of 1 photon per pixel per frame. Experimental results confirm the advantages of the proposed method compared to existing methods.

18.7IVJun 3, 2020

Image Classification in the Dark using Quanta Image Sensors

Abhiram Gnanasambandam, Stanley H. Chan

State-of-the-art image classifiers are trained and tested using well-illuminated images. These images are typically captured by CMOS image sensors with at least tens of photons per pixel. However, in dark environments when the photon flux is low, image classification becomes difficult because the measured signal is suppressed by noise. In this paper, we present a new low-light image classification solution using Quanta Image Sensors (QIS). QIS are a new type of image sensors that possess photon counting ability without compromising on pixel size and spatial resolution. Numerous studies over the past decade have demonstrated the feasibility of QIS for low-light imaging, but their usage for image classification has not been studied. This paper fills the gap by presenting a student-teacher learning scheme which allows us to classify the noisy QIS raw data. We show that with student-teacher learning, we are able to achieve image classification at a photon level of one photon per pixel or lower. Experimental results verify the effectiveness of the proposed method compared to existing solutions.

8.5LGMay 19, 2020

One Size Fits All: Can We Train One Denoiser for All Noise Levels?

Abhiram Gnansambandam, Stanley H. Chan

When training an estimator such as a neural network for tasks like image denoising, it is often preferred to train one estimator and apply it to all noise levels. The de facto training protocol to achieve this goal is to train the estimator with noisy samples whose noise levels are uniformly distributed across the range of interest. However, why should we allocate the samples uniformly? Can we have more training samples that are less noisy, and fewer samples that are more noisy? What is the optimal distribution? How do we obtain such a distribution? The goal of this paper is to address this training sample distribution problem from a minimax risk optimization perspective. We derive a dual ascent algorithm to determine the optimal sampling distribution of which the convergence is guaranteed as long as the set of admissible estimators is closed and convex. For estimators with non-convex admissible sets such as deep neural networks, our dual formulation converges to a solution of the convex relaxation. We discuss how the algorithm can be implemented in practice. We evaluate the algorithm on linear estimators and deep networks.

12.2OPTICSApr 23, 2020

Simulating Anisoplanatic Turbulence by Sampling Inter-modal and Spatially Correlated Zernike Coefficients

Nicholas Chimitt, Stanley H. Chan

Simulating atmospheric turbulence is an essential task for evaluating turbulence mitigation algorithms and training learning-based methods. Advanced numerical simulators for atmospheric turbulence are available, but they require evaluating wave propagation which is computationally expensive. In this paper, we present a propagation-free method for simulating imaging through turbulence. The key idea behind our work is a new method to draw inter-modal and spatially correlated Zernike coefficients. By establishing the equivalence between the angle-of-arrival correlation by Basu, McCrae and Fiorino (2015) and the multi-aperture correlation by Chanan (1992), we show that the Zernike coefficients can be drawn according to a covariance matrix defining the correlations. We propose fast and scalable sampling strategies to draw these samples. The new method allows us to compress the wave propagation problem into a sampling problem, hence making the new simulator significantly faster than existing ones. Experimental results show that the simulator has an excellent match with the theory and real turbulence data.

18.2CVMar 5, 2020

Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal Inference

Chengxi Li, Stanley H. Chan, Yi-Ting Chen

A significant amount of people die in road accidents due to driver errors. To reduce fatalities, developing intelligent driving systems assisting drivers to identify potential risks is in an urgent need. Risky situations are generally defined based on collision prediction in the existing works. However, collision is only a source of potential risks, and a more generic definition is required. In this work, we propose a novel driver-centric definition of risk, i.e., objects influencing drivers' behavior are risky. A new task called risk object identification is introduced. We formulate the task as the cause-effect problem and present a novel two-stage risk object identification framework based on causal inference with the proposed object-level manipulable driving model. We demonstrate favorable performance on risk object identification compared with strong baselines on the Honda Research Institute Driving Dataset (HDD). Our framework achieves a substantial average performance boost over a strong baseline by 7.5%.

17.1CVSep 20, 2019

Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional Networks

Chengxi Li, Yue Meng, Stanley H. Chan et al.

To enable intelligent automated driving systems, a promising strategy is to understand how human drives and interacts with road users in complicated driving situations. In this paper, we propose a 3D-aware egocentric spatial-temporal interaction framework for automated driving applications. Graph convolution networks (GCN) is devised for interaction modeling. We introduce three novel concepts into GCN. First, we decompose egocentric interactions into ego-thing and ego-stuff interaction, modeled by two GCNs. In both GCNs, ego nodes are introduced to encode the interaction between thing objects (e.g., car and pedestrian), and interaction between stuff objects (e.g., lane marking and traffic light). Second, objects' 3D locations are explicitly incorporated into GCN to better model egocentric interactions. Third, to implement ego-stuff interaction in GCN, we propose a MaskAlign operation to extract features for irregular objects. We validate the proposed framework on tactical driver behavior recognition. Extensive experiments are conducted using Honda Research Institute Driving Dataset, the largest dataset with diverse tactical driver behavior annotations. Our framework demonstrates substantial performance boost over baselines on the two experimental settings by 3.9% and 6.0%, respectively. Furthermore, we visualize the learned affinity matrices, which encode ego-thing and ego-stuff interactions, to showcase the proposed framework can capture interactions effectively.

5.1IVMay 17, 2019

Rethinking Atmospheric Turbulence Mitigation

Nicholas Chimitt, Zhiyuan Mao, Guanzhe Hong et al.

State-of-the-art atmospheric turbulence image restoration methods utilize standard image processing tools such as optical flow, lucky region and blind deconvolution to restore the images. While promising results have been reported over the past decade, many of the methods are agnostic to the physical model that generates the distortion. In this paper, we revisit the turbulence restoration problem by analyzing the reference frame generation and the blind deconvolution steps in a typical restoration pipeline. By leveraging tools in large deviation theory, we rigorously prove the minimum number of frames required to generate a reliable reference for both static and dynamic scenes. We discuss how a turbulence agnostic model can lead to potential flaws, and how to configure a simple spatial-temporal non-local weighted averaging method to generate references. For blind deconvolution, we present a new data-driven prior by analyzing the distributions of the point spread functions. We demonstrate how a simple prior can outperform state-of-the-art blind deconvolution methods.

3.4CVMar 23, 2019

Color Filter Arrays for Quanta Image Sensors

Omar A. Elgendy, Stanley H. Chan

Quanta image sensor (QIS) is envisioned to be the next generation image sensor after CCD and CMOS. In this paper, we discuss how to design color filter arrays for QIS and other small pixels. Designing color filter arrays for small pixels is challenging because maximizing the light efficiency while suppressing aliasing and crosstalk are conflicting tasks. We present an optimization-based framework which unifies several mainstream color filter array design methodologies. Our method offers greater generality and flexibility. Compared to existing methods, the new framework can simultaneously handle luminance sensitivity, chrominance sensitivity, cross-talk, anti-aliasing, manufacturability and orthogonality. Extensive experimental comparisons demonstrate the effectiveness of the framework.

8.1CVMar 21, 2019

Megapixel Photon-Counting Color Imaging using Quanta Image Sensor

Abhiram Gnanasambandam, Omar Elgendy, Jiaju Ma et al.

Quanta Image Sensor (QIS) is a single-photon detector designed for extremely low light imaging conditions. Majority of the existing QIS prototypes are monochrome based on single-photon avalanche diodes (SPAD). Passive color imaging has not been demonstrated with single-photon detectors due to the intrinsic difficulty of shrinking the pixel size and increasing the spatial resolution while maintaining acceptable intra-pixel cross-talk. In this paper, we present image reconstruction of the first color QIS with a resolution of $1024 \times 1024$ pixels, supporting both single-bit and multi-bit photon counting capability. Our color image reconstruction is enabled by a customized joint demosaicing-denoising algorithm, leveraging truncated Poisson statistics and variance stabilizing transforms. Experimental results of the new sensor and algorithm demonstrate superior color imaging performance for very low-light conditions with a mean exposure of as low as a few photons per pixel in both real and simulated images.

16.9CVMar 20, 2019

Plug and play methods for magnetic resonance imaging (long version)

Rizwan Ahmad, Charles A. Bouman, Gregery T. Buzzard et al.

Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic tool that provides excellent soft-tissue contrast without the use of ionizing radiation. Compared to other clinical imaging modalities (e.g., CT or ultrasound), however, the data acquisition process for MRI is inherently slow, which motivates undersampling and thus drives the need for accurate, efficient reconstruction methods from undersampled datasets. In this article, we describe the use of "plug-and-play" (PnP) algorithms for MRI image recovery. We first describe the linearly approximated inverse problem encountered in MRI. Then we review several PnP methods, where the unifying commonality is to iteratively call a denoising subroutine as one step of a larger optimization-inspired algorithm. Next, we describe how the result of the PnP method can be interpreted as a solution to an equilibrium equation, allowing convergence analysis from the equilibrium perspective. Finally, we present illustrative examples of PnP methods applied to MRI image recovery.

18.6IVAug 31, 2018

Performance Analysis of Plug-and-Play ADMM: A Graph Signal Processing Perspective

Stanley H. Chan

The Plug-and-Play (PnP) ADMM algorithm is a powerful image restoration framework that allows advanced image denoising priors to be integrated into physical forward models to generate high quality image restoration results. However, despite the enormous number of applications and several theoretical studies trying to prove the convergence by leveraging tools in convex analysis, very little is known about why the algorithm is doing so well. The goal of this paper is to fill the gap by discussing the performance of PnP ADMM. By restricting the denoisers to the class of graph filters under a linearity assumption, or more specifically the symmetric smoothing filters, we offer three contributions: (1) We show conditions under which an equivalent maximum-a-posteriori (MAP) optimization exists, (2) we present a geometric interpretation and show that the performance gain is due to an intrinsic pre-denoising characteristic of the PnP prior, (3) we introduce a new analysis technique via the concept of consensus equilibrium, and provide interpretations to problems involving multiple priors.

2.5CVAug 24, 2018

Automatic Foreground Extraction from Imperfect Backgrounds using Multi-Agent Consensus Equilibrium

Xiran Wang, Jason Juang, Stanley H. Chan

Extracting accurate foreground objects from a scene is an essential step for many video applications. Traditional background subtraction algorithms can generate coarse estimates, but generating high quality masks requires professional softwares with significant human interventions, e.g., providing trimaps or labeling key frames. We propose an automatic foreground extraction method in applications where a static but imperfect background is available. Examples include filming and surveillance where the background can be captured before the objects enter the scene or after they leave the scene. Our proposed method is very robust and produces significantly better estimates than state-of-the-art background subtraction, video segmentation and alpha matting methods. The key innovation of our method is a novel information fusion technique. The fusion framework allows us to integrate the individual strengths of alpha matting, background subtraction and image denoising to produce an overall better estimate. Such integration is particularly important when handling complex scenes with imperfect background. We show how the framework is developed, and how the individual components are built. Extensive experiments and ablation studies are conducted to evaluate the proposed method.

4.4CVNov 17, 2017

Optimal Combination of Image Denoisers

Joon Hee Choi, Omar Elgendy, Stanley H. Chan

Given a set of image denoisers, each having a different denoising capability, is there a provably optimal way of combining these denoisers to produce an overall better result? An answer to this question is fundamental to designing an ensemble of weak estimators for complex scenes. In this paper, we present an optimal combination scheme by leveraging deep neural networks and convex optimization. The proposed framework, called the Consensus Neural Network (CsNet), introduces three new concepts in image denoising: (1) A provably optimal procedure to combine the denoised outputs via convex optimization; (2) A deep neural network to estimate the mean squared error (MSE) of denoised images without needing the ground truths; (3) An image boosting procedure using a deep neural network to improve contrast and to recover lost details of the combined images. Experimental results show that CsNet can consistently improve denoising performance for both deterministic and neural network denoisers.

14.0CVMay 24, 2017

Plug-and-Play Unplugged: Optimization Free Reconstruction using Consensus Equilibrium

Gregery T. Buzzard, Stanley H. Chan, Suhas Sreehari et al.

Regularized inversion methods for image reconstruction are used widely due to their tractability and ability to combine complex physical sensor models with useful regularity criteria. Such methods motivated the recently developed Plug-and-Play prior method, which provides a framework to use advanced denoising algorithms as regularizers in inversion. However, the need to formulate regularized inversion as the solution to an optimization problem limits the possible regularity conditions and physical sensor models. In this paper, we introduce Consensus Equilibrium (CE), which generalizes regularized inversion to include a much wider variety of both forward components and prior components without the need for either to be expressed with a cost function. CE is based on the solution of a set of equilibrium equations that balance data fit and regularity. In this framework, the problem of MAP estimation in regularized inversion is replaced by the problem of solving these equilibrium equations, which can be approached in multiple ways. The key contribution of CE is to provide a novel framework for fusing multiple heterogeneous models of physical sensors or models learned from data. We describe the derivation of the CE equations and prove that the solution of the CE equations generalizes the standard MAP estimate under appropriate circumstances. We also discuss algorithms for solving the CE equations, including ADMM with a novel form of preconditioning and Newton's method. We give examples to illustrate consensus equilibrium and the convergence properties of these algorithms and demonstrate this method on some toy problems and on a denoising example in which we use an array of convolutional neural network denoisers, none of which is tuned to match the noise level in a noisy image but which in consensus can achieve a better result than any of them individually.

30.8CVMay 5, 2016

Plug-and-Play ADMM for Image Restoration: Fixed Point Convergence and Applications

Stanley H. Chan, Xiran Wang, Omar A. Elgendy

Alternating direction method of multiplier (ADMM) is a widely used algorithm for solving constrained optimization problems in image restoration. Among many useful features, one critical feature of the ADMM algorithm is its modular structure which allows one to plug in any off-the-shelf image denoising algorithm for a subproblem in the ADMM algorithm. Because of the plug-in nature, this type of ADMM algorithms is coined the name "Plug-and-Play ADMM". Plug-and-Play ADMM has demonstrated promising empirical results in a number of recent papers. However, it is unclear under what conditions and by using what denoising algorithms would it guarantee convergence. Also, since Plug-and-Play ADMM uses a specific way to split the variables, it is unclear if fast implementation can be made for common Gaussian and Poissonian image restoration problems. In this paper, we propose a Plug-and-Play ADMM algorithm with provable fixed point convergence. We show that for any denoising algorithm satisfying an asymptotic criteria, called bounded denoisers, Plug-and-Play ADMM converges to a fixed point under a continuation scheme. We also present fast implementations for two image restoration problems on super-resolution and single-photon imaging. We compare Plug-and-Play ADMM with state-of-the-art algorithms in each problem type, and demonstrate promising experimental results of the algorithm.

7.9CVFeb 1, 2016

Algorithm-Induced Prior for Image Restoration

Stanley H. Chan

This paper studies a type of image priors that are constructed implicitly through the alternating direction method of multiplier (ADMM) algorithm, called the algorithm-induced prior. Different from classical image priors which are defined before running the reconstruction algorithm, algorithm-induced priors are defined by the denoising procedure used to replace one of the two modules in the ADMM algorithm. Since such prior is not explicitly defined, analyzing the performance has been difficult in the past. Focusing on the class of symmetric smoothing filters, this paper presents an explicit expression of the prior induced by the ADMM algorithm. The new prior is reminiscent to the conventional graph Laplacian but with stronger reconstruction performance. It can also be shown that the overall reconstruction has an efficient closed-form implementation if the associated symmetric smoothing filter is low rank. The results are validated with experiments on image inpainting.

4.6CVJan 1, 2016

Understanding Symmetric Smoothing Filters: A Gaussian Mixture Model Perspective

Stanley H. Chan, Todd Zickler, Yue M. Lu

Many patch-based image denoising algorithms can be formulated as applying a smoothing filter to the noisy image. Expressed as matrices, the smoothing filters must be row normalized so that each row sums to unity. Surprisingly, if we apply a column normalization before the row normalization, the performance of the smoothing filter can often be significantly improved. Prior works showed that such performance gain is related to the Sinkhorn-Knopp balancing algorithm, an iterative procedure that symmetrizes a row-stochastic matrix to a doubly-stochastic matrix. However, a complete understanding of the performance gain phenomenon is still lacking. In this paper, we study the performance gain phenomenon from a statistical learning perspective. We show that Sinkhorn-Knopp is equivalent to an Expectation-Maximization (EM) algorithm of learning a Gaussian mixture model of the image patches. By establishing the correspondence between the steps of Sinkhorn-Knopp and the EM algorithm, we provide a geometrical interpretation of the symmetrization process. This observation allows us to develop a new denoising algorithm called Gaussian mixture model symmetric smoothing filter (GSF). GSF is an extension of the Sinkhorn-Knopp and is a generalization of the original smoothing filters. Despite its simple formulation, GSF outperforms many existing smoothing filters and has a similar performance compared to several state-of-the-art denoising algorithms.