IVJul 20, 2023Code
Physics-Driven Turbulence Image Restoration with Stochastic RefinementAjay Jaiswal, Xingguang Zhang, Stanley H. Chan et al.
Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.
IVJul 20, 2022
Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer ModelZhiyuan Mao, Ajay Jaiswal, Zhangyang Wang et al.
Image restoration algorithms for atmospheric turbulence are known to be much more challenging to design than traditional ones such as blur or noise because the distortion caused by the turbulence is an entanglement of spatially varying blur, geometric distortion, and sensor noise. Existing CNN-based restoration methods built upon convolutional kernels with static weights are insufficient to handle the spatially dynamical atmospheric turbulence effect. To address this problem, in this paper, we propose a physics-inspired transformer model for imaging through atmospheric turbulence. The proposed network utilizes the power of transformer blocks to jointly extract a dynamical turbulence distortion map and restore a turbulence-free image. In addition, recognizing the lack of a comprehensive dataset, we collect and present two new real-world turbulence datasets that allow for evaluation with both classical objective metrics (e.g., PSNR and SSIM) and a new task-driven metric using text recognition accuracy. Both real testing sets and all related code will be made publicly available.
CVMay 30
OptiWorld: Optimal Control for Video World Generation under Physical ConstraintsYu Yuan, Jianhao Yuan, Xijun Wang et al.
Video generation models are becoming a scalable form of world models, but they mainly generate plausible motion rather than proactively control or optimize the underlying dynamics. As a result, an object in the generated video may follow trajectories that are unsafe, not smooth, inefficient, or physically inconsistent. In this work, we propose \textbf{OptiWorld}, a framework that brings classical optimal control into video generation at inference time. OptiWorld first extracts a compact, task-relevant world state, then plans an optimal trajectory under physical constraints, and finally renders the video conditioned on this trajectory. We formulate planning as a geometric problem on a continuous manifold, which converts 3D geometry and task-dependent physical constraints into a unified planning geometry. By adding this optimal-control layer, OptiWorld generates videos with preferable dynamics, demonstrating strong potential in multiple tasks including goal-conditioned image-to-video generation, video dynamics editing, and counterfactual generation.
IVJul 13, 2022
Imaging through the Atmosphere using Turbulence Mitigation TransformerXingguang Zhang, Zhiyuan Mao, Nicholas Chimitt et al.
Restoring images distorted by atmospheric turbulence is a ubiquitous problem in long-range imaging applications. While existing deep-learning-based methods have demonstrated promising results in specific testing conditions, they suffer from three limitations: (1) lack of generalization capability from synthetic training data to real turbulence data; (2) failure to scale, hence causing memory and speed challenges when extending the idea to a large number of frames; (3) lack of a fast and accurate simulator to generate data for training neural networks. In this paper, we introduce the turbulence mitigation transformer (TMT) that explicitly addresses these issues. TMT brings three contributions: Firstly, TMT explicitly uses turbulence physics by decoupling the turbulence degradation and introducing a multi-scale loss for removing distortion, thus improving effectiveness. Secondly, TMT presents a new attention module along the temporal axis to extract extra features efficiently, thus improving memory and speed. Thirdly, TMT introduces a new simulator based on the Fourier sampler, temporal correlation, and flexible kernel size, thus improving our capability to synthesize better training data. TMT outperforms state-of-the-art video restoration models, especially in generalizing from synthetic to real turbulence data. Code, videos, and datasets are available at \href{https://xg416.github.io/TMT}{https://xg416.github.io/TMT}.
IVMar 30, 2023
HDR Imaging with Spatially Varying Signal-to-Noise RatiosYiheng Chi, Xingguang Zhang, Stanley H. Chan
While today's high dynamic range (HDR) image fusion algorithms are capable of blending multiple exposures, the acquisition is often controlled so that the dynamic range within one exposure is narrow. For HDR imaging in photon-limited situations, the dynamic range can be enormous and the noise within one exposure is spatially varying. Existing image denoising algorithms and HDR fusion algorithms both fail to handle this situation, leading to severe limitations in low-light HDR imaging. This paper presents two contributions. Firstly, we identify the source of the problem. We find that the issue is associated with the co-existence of (1) spatially varying signal-to-noise ratio, especially the excessive noise due to very dark regions, and (2) a wide luminance range within each exposure. We show that while the issue can be handled by a bank of denoisers, the complexity is high. Secondly, we propose a new method called the spatially varying high dynamic range (SV-HDR) fusion network to simultaneously denoise and fuse images. We introduce a new exposure-shared block within our custom-designed multi-scale transformer framework. In a variety of testing conditions, the performance of the proposed SV-HDR is better than the existing methods.
IVJun 30, 2023
Spatially Varying Exposure with 2-by-2 Multiplexing: Optimality and UniversalityXiangyu Qu, Yiheng Chi, Stanley H. Chan
The advancement of new digital image sensors has enabled the design of exposure multiplexing schemes where a single image capture can have multiple exposures and conversion gains in an interlaced format, similar to that of a Bayer color filter array. In this paper, we ask the question of how to design such multiplexing schemes for adaptive high-dynamic range (HDR) imaging where the multiplexing scheme can be updated according to the scenes. We present two new findings. (i) We address the problem of design optimality. We show that given a multiplex pattern, the conventional optimality criteria based on the input/output-referred signal-to-noise ratio (SNR) of the independently measured pixels can lead to flawed decisions because it cannot encapsulate the location of the saturated pixels. We overcome the issue by proposing a new concept known as the spatially varying exposure risk (SVE-Risk) which is a pseudo-idealistic quantification of the amount of recoverable pixels. We present an efficient enumeration algorithm to select the optimal multiplex patterns. (ii) We report a design universality observation that the design of the multiplex pattern can be decoupled from the image reconstruction algorithm. This is a significant departure from the recent literature that the multiplex pattern should be jointly optimized with the reconstruction algorithm. Our finding suggests that in the context of exposure multiplexing, an end-to-end training may not be necessary.
CVMar 29, 2023
Towards Understanding the Effect of Pretraining Label GranularityGuan Zhe Hong, Yin Cui, Ariel Fuxman et al.
In this paper, we study how the granularity of pretraining labels affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem. Empirically, we show that pretraining on the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining on other coarser granularity levels, which supports the common practice used in the community. Theoretically, we explain the benefit of fine-grained pretraining by proving that, for a data distribution satisfying certain hierarchy conditions, 1) coarse-grained pretraining only allows a neural network to learn the "common" or "easy-to-learn" features well, while 2) fine-grained pretraining helps the network learn the "rarer" or "fine-grained" features in addition to the common ones, thus improving its accuracy on hard downstream test samples in which common features are missing or weak in strength. Furthermore, we perform comprehensive experiments using the label hierarchies of iNaturalist 2021 and observe that the following conditions, in addition to proper choice of label granularity, enable the transfer to work well in practice: 1) the pretraining dataset needs to have a meaningful label hierarchy, and 2) the pretraining and target label functions need to align well.
IVJan 8, 2024Code
Spatio-Temporal Turbulence Mitigation: A Translational PerspectiveXingguang Zhang, Nicholas Chimitt, Yiheng Chi et al.
Recovering images distorted by atmospheric turbulence is a challenging inverse problem due to the stochastic nature of turbulence. Although numerous turbulence mitigation (TM) algorithms have been proposed, their efficiency and generalization to real-world dynamic scenarios remain severely limited. Building upon the intuitions of classical TM algorithms, we present the Deep Atmospheric TUrbulence Mitigation network (DATUM). DATUM aims to overcome major challenges when transitioning from classical to deep learning approaches. By carefully integrating the merits of classical multi-frame TM methods into a deep network structure, we demonstrate that DATUM can efficiently perform long-range temporal aggregation using a recurrent fashion, while deformable attention and temporal-channel attention seamlessly facilitate pixel registration and lucky imaging. With additional supervision, tilt and blur degradation can be jointly mitigated. These inductive biases empower DATUM to significantly outperform existing methods while delivering a tenfold increase in processing speed. A large-scale training dataset, ATSyn, is presented as a co-invention to enable generalization in real turbulence. Our code and datasets are available at https://xg416.github.io/DATUM.
IVOct 19, 2024Code
Quanta Video RestorationPrateek Chennuri, Yiheng Chi, Enze Jiang et al.
The proliferation of single-photon image sensors has opened the door to a plethora of high-speed and low-light imaging applications. However, data collected by these sensors are often 1-bit or few-bit, and corrupted by noise and strong motion. Conventional video restoration methods are not designed to handle this situation, while specialized quanta burst algorithms have limited performance when the number of input frames is low. In this paper, we introduce Quanta Video Restoration (QUIVER), an end-to-end trainable network built on the core ideas of classical quanta restoration methods, i.e., pre-filtering, flow estimation, fusion, and refinement. We also collect and publish I2-2000FPS, a high-speed video dataset with the highest temporal resolution of 2000 frames-per-second, for training and testing. On simulated and real data, QUIVER outperforms existing quanta restoration methods by a significant margin. Code and dataset available at https://github.com/chennuriprateek/Quanta_Video_Restoration-QUIVER-
IVNov 3, 2025
Opto-Electronic Convolutional Neural Network Design Via Direct Kernel OptimizationAli Almuallem, Harshana Weligampola, Abhiram Gnanasambandam et al.
Opto-electronic neural networks integrate optical front-ends with electronic back-ends to enable fast and energy-efficient vision. However, conventional end-to-end optimization of both the optical and electronic modules is limited by costly simulations and large parameter spaces. We introduce a two-stage strategy for designing opto-electronic convolutional neural networks (CNNs): first, train a standard electronic CNN, then realize the optical front-end implemented as a metasurface array through direct kernel optimization of its first convolutional layer. This approach reduces computational and memory demands by hundreds of times and improves training stability compared to end-to-end optimization. On monocular depth estimation, the proposed two-stage design achieves twice the accuracy of end-to-end training under the same training time and resource constraints.
CVDec 3, 2025
SeeU: Seeing the Unseen World via 4D Dynamics-aware GenerationYu Yuan, Tharindu Wickremasinghe, Zeeshan Nadir et al.
Images and videos are discrete 2D projections of the 4D world (3D space + time). Most visual understanding, prediction, and generation operate directly on 2D observations, leading to suboptimal performance. We propose SeeU, a novel approach that learns the continuous 4D dynamics and generate the unseen visual contents. The principle behind SeeU is a new 2D$\to$4D$\to$2D learning framework. SeeU first reconstructs the 4D world from sparse and monocular 2D frames (2D$\to$4D). It then learns the continuous 4D dynamics on a low-rank representation and physical constraints (discrete 4D$\to$continuous 4D). Finally, SeeU rolls the world forward in time, re-projects it back to 2D at sampled times and viewpoints, and generates unseen regions based on spatial-temporal context awareness (4D$\to$2D). By modeling dynamics in 4D, SeeU achieves continuous and physically-consistent novel visual generation, demonstrating strong potentials in multiple tasks including unseen temporal generation, unseen spatial generation, and video editing.
IVDec 9, 2025
FlowSteer: Conditioning Flow Field for Consistent Image RestorationTharindu Wickremasinghe, Chenyang Qi, Harshana Weligampola et al.
Flow-based text-to-image (T2I) models excel at prompt-driven image generation, but falter on Image Restoration (IR), often "drifting away" from being faithful to the measurement. Prior work mitigate this drift with data-specific flows or task-specific adapters that are computationally heavy and not scalable across tasks. This raises the question "Can't we efficiently manipulate the existing generative capabilities of a flow model?" To this end, we introduce FlowSteer (FS), an operator-aware conditioning scheme that injects measurement priors along the sampling path,coupling a frozed flow's implicit guidance with explicit measurement constraints. Across super-resolution, deblurring, denoising, and colorization, FS improves measurement consistency and identity preservation in a strictly zero-shot setting-no retrained models, no adapters. We show how the nature of flow models and their sensitivities to noise inform the design of such a scheduler. FlowSteer, although simple, achieves a higher fidelity of reconstructed images, while leveraging the rich generative priors of flow models.
LGJan 1
Practical Geometric and Quantum Kernel Methods for Predicting Skeletal Muscle Outcomes in chronic obstructive pulmonary diseaseAzadeh Alavi, Hamidreza Khalili, Stanley H. Chan et al.
Skeletal muscle dysfunction is a clinically relevant extra-pulmonary manifestation of chronic obstructive pulmonary disease (COPD) and is closely linked to systemic and airway inflammation. This motivates predictive modelling of muscle outcomes from minimally invasive biomarkers that can be acquired longitudinally. We study a small-sample preclinical dataset comprising 213 animals across two conditions (Sham versus cigarette-smoke exposure), with blood and bronchoalveolar lavage fluid measurements and three continuous targets: tibialis anterior muscle weight (milligram: mg), specific force (millinewton: mN), and a derived muscle quality index (mN per mg). We benchmark tuned classical baselines, geometry-aware symmetric positive definite (SPD) descriptors with Stein divergence, and quantum kernel models designed for low-dimensional tabular data. In the muscle-weight setting, quantum kernel ridge regression using four interpretable inputs (blood C-reactive protein, neutrophil count, bronchoalveolar lavage cellularity, and condition) attains a test root mean squared error of 4.41 mg and coefficient of determination of 0.605, improving over a matched ridge baseline on the same feature set (4.70 mg and 0.553). Geometry-informed Stein-divergence prototype distances yield a smaller but consistent gain in the biomarker-only setting (4.55 mg versus 4.79 mg). Screening-style evaluation, obtained by thresholding the continuous outcome at 0.8 times the training Sham mean, achieves an area under the receiver operating characteristic curve (ROC-AUC) of up to 0.90 for detecting low muscle weight. These results indicate that geometric and quantum kernel lifts can provide measurable benefits in low-data, low-feature biomedical prediction problems, while preserving interpretability and transparent model selection.
SPJul 29, 2024
Analysis and Improvement of Rank-Ordered Mean Algorithm in Single-Photon LiDARWilliam C. Yau, Weijian Zhang, Hashan Kavinga Weerasooriya et al.
Depth estimation using a single-photon LiDAR is often solved by a matched filter. It is, however, error-prone in the presence of background noise. A commonly used technique to reject background noise is the rank-ordered mean (ROM) filter previously reported by Shin \textit{et al.} (2015). ROM rejects noisy photon arrival timestamps by selecting only a small range of them around the median statistics within its local neighborhood. Despite the promising performance of ROM, its theoretical performance limit is unknown. In this paper, we theoretically characterize the ROM performance by showing that ROM fails when the reflectivity drops below a threshold predetermined by the depth and signal-to-background ratio, and its accuracy undergoes a phase transition at the cutoff. Based on our theory, we propose an improved signal extraction technique by selecting tight timestamp clusters. Experimental results show that the proposed algorithm improves depth estimation performance over ROM by 3 orders of magnitude at the same signal intensities, and achieves high image fidelity at noise levels as high as 17 times that of signal.
LGMar 26, 2024
Tutorial on Diffusion Models for Imaging and VisionStanley H. Chan
The astonishing growth of generative tools in recent years has empowered many exciting applications in text-to-image generation and text-to-video generation. The underlying principle behind these generative tools is the concept of diffusion, a particular sampling mechanism that has overcome some shortcomings that were deemed difficult in the previous approaches. The goal of this tutorial is to discuss the essential ideas underlying the diffusion models. The target audience of this tutorial includes undergraduate and graduate students who are interested in doing research on diffusion models or applying these models to solve other problems.
SPMar 25, 2024
Resolution Limit of Single-Photon LiDARStanley H. Chan, Hashan K. Weerasooriya, Weijian Zhang et al.
Single-photon Light Detection and Ranging (LiDAR) systems are often equipped with an array of detectors for improved spatial resolution and sensing speed. However, given a fixed amount of flux produced by the laser transmitter across the scene, the per-pixel Signal-to-Noise Ratio (SNR) will decrease when more pixels are packed in a unit space. This presents a fundamental trade-off between the spatial resolution of the sensor array and the SNR received at each pixel. Theoretical characterization of this fundamental limit is explored. By deriving the photon arrival statistics and introducing a series of new approximation techniques, the Mean Squared Error (MSE) of the maximum-likelihood estimator of the time delay is derived. The theoretical predictions align well with simulations and real data.
CVMay 19, 2025
Joint Depth and Reflectivity Estimation using Single-Photon LiDARHashan K. Weerasooriya, Prateek Chennuri, Weijian Zhang et al.
Single-Photon Light Detection and Ranging (SP-LiDAR is emerging as a leading technology for long-range, high-precision 3D vision tasks. In SP-LiDAR, timestamps encode two complementary pieces of information: pulse travel time (depth) and the number of photons reflected by the object (reflectivity). Existing SP-LiDAR reconstruction methods typically recover depth and reflectivity separately or sequentially use one modality to estimate the other. Moreover, the conventional 3D histogram construction is effective mainly for slow-moving or stationary scenes. In dynamic scenes, however, it is more efficient and effective to directly process the timestamps. In this paper, we introduce an estimation method to simultaneously recover both depth and reflectivity in fast-moving scenes. We offer two contributions: (1) A theoretical analysis demonstrating the mutual correlation between depth and reflectivity and the conditions under which joint estimation becomes beneficial. (2) A novel reconstruction method, "SPLiDER", which exploits the shared information to enhance signal recovery. On both synthetic and real SP-LiDAR data, our method outperforms existing approaches, achieving superior joint reconstruction quality.
CVMay 7, 2025
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and GaitFeng Liu, Nicholas Chimitt, Lanqing Guo et al. · gatech
We address the problem of whole-body person recognition in unconstrained environments. This problem arises in surveillance scenarios such as those in the IARPA Biometric Recognition and Identification at Altitude and Range (BRIAR) program, where biometric data is captured at long standoff distances, elevated viewing angles, and under adverse atmospheric conditions (e.g., turbulence and high wind velocity). To this end, we propose FarSight, a unified end-to-end system for person recognition that integrates complementary biometric cues across face, gait, and body shape modalities. FarSight incorporates novel algorithms across four core modules: multi-subject detection and tracking, recognition-aware video restoration, modality-specific biometric feature encoding, and quality-guided multi-modal fusion. These components are designed to work cohesively under degraded image conditions, large pose and scale variations, and cross-domain gaps. Extensive experiments on the BRIAR dataset, one of the most comprehensive benchmarks for long-range, multi-modal biometric recognition, demonstrate the effectiveness of FarSight. Compared to our preliminary system, this system achieves a 34.1% absolute gain in 1:1 verification accuracy (TAR@0.1% FAR), a 17.8% increase in closed-set identification (Rank-20), and a 34.3% reduction in open-set identification errors (FNIR@1% FPIR). Furthermore, FarSight was evaluated in the 2025 NIST RTE Face in Video Evaluation (FIVE), which conducts standardized face recognition testing on the BRIAR dataset. These results establish FarSight as a state-of-the-art solution for operational biometric recognition in challenging real-world conditions.
CVMar 28, 2024
Generative Quanta Color ImagingVishal Purohit, Junjie Luo, Yiheng Chi et al.
The astonishing development of single-photon cameras has created an unprecedented opportunity for scientific and industrial imaging. However, the high data throughput generated by these 1-bit sensors creates a significant bottleneck for low-power applications. In this paper, we explore the possibility of generating a color image from a single binary frame of a single-photon camera. We evidently find this problem being particularly difficult to standard colorization approaches due to the substantial degree of exposure variation. The core innovation of our paper is an exposure synthesis model framed under a neural ordinary differential equation (Neural ODE) that allows us to generate a continuum of exposures from a single observation. This innovation ensures consistent exposure in binary images that colorizers take on, resulting in notably enhanced colorization. We demonstrate applications of the method in single-image and burst colorization and show superior generative performance over baselines. Project website can be found at https://vishal-s-p.github.io/projects/2023/generative_quanta_color.html.
CVApr 3, 2025
Learning Phase Distortion with Selective State Space Models for Video Turbulence MitigationXingguang Zhang, Nicholas Chimitt, Xijun Wang et al.
Atmospheric turbulence is a major source of image degradation in long-range imaging systems. Although numerous deep learning-based turbulence mitigation (TM) methods have been proposed, many are slow, memory-hungry, and do not generalize well. In the spatial domain, methods based on convolutional operators have a limited receptive field, so they cannot handle a large spatial dependency required by turbulence. In the temporal domain, methods relying on self-attention can, in theory, leverage the lucky effects of turbulence, but their quadratic complexity makes it difficult to scale to many frames. Traditional recurrent aggregation methods face parallelization challenges. In this paper, we present a new TM method based on two concepts: (1) A turbulence mitigation network based on the Selective State Space Model (MambaTM). MambaTM provides a global receptive field in each layer across spatial and temporal dimensions while maintaining linear computational complexity. (2) Learned Latent Phase Distortion (LPD). LPD guides the state space model. Unlike classical Zernike-based representations of phase distortion, the new LPD map uniquely captures the actual effects of turbulence, significantly improving the model's capability to estimate degradation by reducing the ill-posedness. Our proposed method exceeds current state-of-the-art networks on various synthetic and real-world TM benchmarks with significantly faster inference speed.
IVMar 31
Pupil Design for Computational Wavefront EstimationAli Almuallem, Nicholas Chimitt, Bole Ma et al.
Establishing a precise connection between imaged intensity and the incident wavefront is essential for emerging applications in adaptive optics, holography, computational microscopy, and non-line-of-sight imaging. While prior work has shown that breaking symmetries in pupil design enables wavefront recovery from a single intensity measurement, there is little guidance on how to design a pupil that improves wavefront estimation. In this work we introduce a quantitative asymmetry metric to bridge this gap and, through an extensive empirical study and supporting analysis, demonstrate that increasing asymmetry enhances wavefront recoverability. We analyze the trade-offs in pupil design, and the impact on light throughput along with performance in noise. Both large-scale simulations and optical bench experiments are carried out to support our findings.
CVSep 25, 2025
NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian DynamicsYu Yuan, Xijun Wang, Tharindu Wickremasinghe et al.
A primary bottleneck in large-scale text-to-video generation today is physical consistency and controllability. Despite recent advances, state-of-the-art models often produce unrealistic motions, such as objects falling upward, or abrupt changes in velocity and direction. Moreover, these models lack precise parameter control, struggling to generate physically consistent dynamics under different initial conditions. We argue that this fundamental limitation stems from current models learning motion distributions solely from appearance, while lacking an understanding of the underlying dynamics. In this work, we propose NewtonGen, a framework that integrates data-driven synthesis with learnable physical principles. At its core lies trainable Neural Newtonian Dynamics (NND), which can model and predict a variety of Newtonian motions, thereby injecting latent dynamical constraints into the video generation process. By jointly leveraging data priors and dynamical guidance, NewtonGen enables physically consistent video synthesis with precise parameter control.
CVNov 9, 2021
Graph-Based Depth Denoising & Dequantization for Point Cloud EnhancementXue Zhang, Gene Cheung, Jiahao Pang et al.
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements directly on the sensed images \textit{a priori}, before synthesizing a 3D point cloud. By enhancing near the physical sensing process, we tailor our optimization to our depth formation model before subsequent processing steps that obscure measurement errors. Specifically, we model depth formation as a combined process of signal-dependent noise addition and non-uniform log-based quantization. The designed model is validated (with parameters fitted) using collected empirical data from a representative depth sensor. To enhance each pixel row in a depth image, we first encode intra-view similarities between available row pixels as edge weights via feature graph learning. We next establish inter-view similarities with another rectified depth image via viewpoint mapping and sparse linear interpolation. This leads to a maximum a posteriori (MAP) graph filtering objective that is convex and differentiable. We minimize the objective efficiently using accelerated gradient descent (AGD), where the optimal step size is approximated via Gershgorin circle theorem (GCT). Experiments show that our method significantly outperformed recent point cloud denoising schemes and state-of-the-art image denoising schemes in two established point cloud quality metrics.
CVAug 20, 2021
Detecting and Segmenting Adversarial Graphics Patterns from ImagesXiangyu Qu, Stanley H. Chan
Adversarial attacks pose a substantial threat to computer vision system security, but the social media industry constantly faces another form of "adversarial attack" in which the hackers attempt to upload inappropriate images and fool the automated screening systems by adding artificial graphics patterns. In this paper, we formulate the defense against such attacks as an artificial graphics pattern segmentation problem. We evaluate the efficacy of several segmentation algorithms and, based on observation of their performance, propose a new method tailored to this specific problem. Extensive experiments show that the proposed method outperforms the baselines and has a promising generalization capability, which is the most crucial aspect in segmenting artificial graphics patterns.
AIAug 13, 2021
Optical Adversarial AttackAbhiram Gnanasambandam, Alex M. Sherman, Stanley H. Chan
We introduce OPtical ADversarial attack (OPAD). OPAD is an adversarial attack in the physical space aiming to fool image classifiers without physically touching the objects (e.g., moving or painting the objects). The principle of OPAD is to use structured illumination to alter the appearance of the target objects. The system consists of a low-cost projector, a camera, and a computer. The challenge of the problem is the non-linearity of the radiometric response of the projector and the spatially varying spectral response of the scene. Attacks generated in a conventional approach do not work in this setting unless they are calibrated to compensate for such a projector-camera model. The proposed solution incorporates the projector-camera model into the adversarial attack optimization, where a new attack formulation is derived. Experimental results prove the validity of the solution. It is demonstrated that OPAD can optically attack a real 3D object in the presence of background lighting for white-box, black-box, targeted, and untargeted attacks. Theoretical analysis is presented to quantify the fundamental performance limit of the system.
IVJul 24, 2021
Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space TransformZhiyuan Mao, Nicholas Chimitt, Stanley H. Chan
Fast and accurate simulation of imaging through atmospheric turbulence is essential for developing turbulence mitigation algorithms. Recognizing the limitations of previous approaches, we introduce a new concept known as the phase-to-space (P2S) transform to significantly speed up the simulation. P2S is build upon three ideas: (1) reformulating the spatially varying convolution as a set of invariant convolutions with basis functions, (2) learning the basis function via the known turbulence statistics models, (3) implementing the P2S transform via a light-weight network that directly convert the phase representation to spatial representation. The new simulator offers 300x -- 1000x speed up compared to the mainstream split-step simulators while preserving the essential turbulence statistics.
SPJun 30, 2021
Graph Signal Restoration Using Nested Deep Algorithm UnrollingMasatoshi Nagahama, Koki Yamada, Yuichi Tanaka et al.
Graph signal processing is a ubiquitous task in many applications such as sensor, social, transportation and brain networks, point cloud processing, and graph neural networks. Often, graph signals are corrupted in the sensing process, thus requiring restoration. In this paper, we propose two graph signal restoration methods based on deep algorithm unrolling (DAU). First, we present a graph signal denoiser by unrolling iterations of the alternating direction method of multiplier (ADMM). We then suggest a general restoration method for linear degradation by unrolling iterations of Plug-and-Play ADMM (PnP-ADMM). In the second approach, the unrolled ADMM-based denoiser is incorporated as a submodule, leading to a nested DAU structure. The parameters in the proposed denoising/restoration methods are trainable in an end-to-end manner. Our approach is interpretable and keeps the number of parameters small since we only tune graph-independent regularization parameters. We overcome two main challenges in existing graph signal restoration methods: 1) limited performance of convex optimization algorithms due to fixed parameters which are often determined manually. 2) large number of parameters of graph neural networks that result in difficulty of training. Several experiments for graph signal denoising and interpolation are performed on synthetic and real-world data. The proposed methods show performance improvements over several existing techniques in terms of root mean squared error in both tasks.
CVJun 24, 2021
DROID: Driver-centric Risk Object IdentificationChengxi Li, Stanley H. Chan, Yi-Ting Chen
Identification of high-risk driving situations is generally approached through collision risk estimation or accident pattern recognition. In this work, we approach the problem from the perspective of subjective risk. We operationalize subjective risk assessment by predicting driver behavior changes and identifying the cause of changes. To this end, we introduce a new task called driver-centric risk object identification (DROID), which uses egocentric video to identify object(s) influencing a driver's behavior, given only the driver's response as the supervision signal. We formulate the task as a cause-effect problem and present a novel two-stage DROID framework, taking inspiration from models of situation awareness and causal inference. A subset of data constructed from the Honda Research Institute Driving Dataset (HDD) is used to evaluate DROID. We demonstrate state-of-the-art DROID performance, even compared with strong baseline models using this dataset. Additionally, we conduct extensive ablative studies to justify our design choices. Moreover, we demonstrate the applicability of DROID for risk assessment.
LGMar 13, 2021
Student-Teacher Learning from Clean Inputs to Noisy InputsGuanzhe Hong, Zhiyuan Mao, Xiaojun Lin et al.
Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the student's input sample is corrupted by noise. However, there is a lack of theoretical insights into why and when this method of transferring knowledge can be successful between such heterogeneous tasks. We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks. We identify three vital factors to the success of the method: (1) whether the student is trained to zero training loss; (2) how knowledgeable the teacher is on the clean-input problem; (3) how the teacher decomposes its knowledge in its hidden features. Lack of proper control in any of the three factors leads to failure of the student-teacher learning method.
IVNov 6, 2020
HDR Imaging with Quanta Image Sensors: Theoretical Limits and Optimal ReconstructionAbhiram Gnanasambandam, Stanley H. Chan
High dynamic range (HDR) imaging is one of the biggest achievements in modern photography. Traditional solutions to HDR imaging are designed for and applied to CMOS image sensors (CIS). However, the mainstream one-micron CIS cameras today generally have a high read noise and low frame-rate. These, in turn, limit the acquisition speed and quality, making the cameras slow in the HDR mode. In this paper, we propose a new computational photography technique for HDR imaging. Recognizing the limitations of CIS, we use the Quanta Image Sensor (QIS) to trade the spatial-temporal resolution with bit-depth. QIS is a single-photon image sensor that has comparable pixel pitch to CIS but substantially lower dark current and read noise. We provide a complete theoretical characterization of the sensor in the context of HDR imaging, by proving the fundamental limits in the dynamic range that QIS can offer and the trade-offs with noise and speed. In addition, we derive an optimal reconstruction algorithm for single-bit and multi-bit QIS. Our algorithm is theoretically optimal for \emph{all} linear reconstruction schemes based on exposure bracketing. Experimental results confirm the validity of the theory and algorithm, based on synthetic and real QIS data.
IVJul 16, 2020
Dynamic Low-light Imaging with Quanta Image SensorsYiheng Chi, Abhiram Gnanasambandam, Vladlen Koltun et al.
Imaging in low light is difficult because the number of photons arriving at the sensor is low. Imaging dynamic scenes in low-light environments is even more difficult because as the scene moves, pixels in adjacent frames need to be aligned before they can be denoised. Conventional CMOS image sensors (CIS) are at a particular disadvantage in dynamic low-light settings because the exposure cannot be too short lest the read noise overwhelms the signal. We propose a solution using Quanta Image Sensors (QIS) and present a new image reconstruction algorithm. QIS are single-photon image sensors with photon counting capabilities. Studies over the past decade have confirmed the effectiveness of QIS for low-light imaging but reconstruction algorithms for dynamic scenes in low light remain an open problem. We fill the gap by proposing a student-teacher training protocol that transfers knowledge from a motion teacher and a denoising teacher to a student network. We show that dynamic scenes can be reconstructed from a burst of frames at a photon level of 1 photon per pixel per frame. Experimental results confirm the advantages of the proposed method compared to existing methods.
IVJun 3, 2020
Image Classification in the Dark using Quanta Image SensorsAbhiram Gnanasambandam, Stanley H. Chan
State-of-the-art image classifiers are trained and tested using well-illuminated images. These images are typically captured by CMOS image sensors with at least tens of photons per pixel. However, in dark environments when the photon flux is low, image classification becomes difficult because the measured signal is suppressed by noise. In this paper, we present a new low-light image classification solution using Quanta Image Sensors (QIS). QIS are a new type of image sensors that possess photon counting ability without compromising on pixel size and spatial resolution. Numerous studies over the past decade have demonstrated the feasibility of QIS for low-light imaging, but their usage for image classification has not been studied. This paper fills the gap by presenting a student-teacher learning scheme which allows us to classify the noisy QIS raw data. We show that with student-teacher learning, we are able to achieve image classification at a photon level of one photon per pixel or lower. Experimental results verify the effectiveness of the proposed method compared to existing solutions.
LGMay 19, 2020
One Size Fits All: Can We Train One Denoiser for All Noise Levels?Abhiram Gnansambandam, Stanley H. Chan
When training an estimator such as a neural network for tasks like image denoising, it is often preferred to train one estimator and apply it to all noise levels. The de facto training protocol to achieve this goal is to train the estimator with noisy samples whose noise levels are uniformly distributed across the range of interest. However, why should we allocate the samples uniformly? Can we have more training samples that are less noisy, and fewer samples that are more noisy? What is the optimal distribution? How do we obtain such a distribution? The goal of this paper is to address this training sample distribution problem from a minimax risk optimization perspective. We derive a dual ascent algorithm to determine the optimal sampling distribution of which the convergence is guaranteed as long as the set of admissible estimators is closed and convex. For estimators with non-convex admissible sets such as deep neural networks, our dual formulation converges to a solution of the convex relaxation. We discuss how the algorithm can be implemented in practice. We evaluate the algorithm on linear estimators and deep networks.
OPTICSApr 23, 2020
Simulating Anisoplanatic Turbulence by Sampling Inter-modal and Spatially Correlated Zernike CoefficientsNicholas Chimitt, Stanley H. Chan
Simulating atmospheric turbulence is an essential task for evaluating turbulence mitigation algorithms and training learning-based methods. Advanced numerical simulators for atmospheric turbulence are available, but they require evaluating wave propagation which is computationally expensive. In this paper, we present a propagation-free method for simulating imaging through turbulence. The key idea behind our work is a new method to draw inter-modal and spatially correlated Zernike coefficients. By establishing the equivalence between the angle-of-arrival correlation by Basu, McCrae and Fiorino (2015) and the multi-aperture correlation by Chanan (1992), we show that the Zernike coefficients can be drawn according to a covariance matrix defining the correlations. We propose fast and scalable sampling strategies to draw these samples. The new method allows us to compress the wave propagation problem into a sampling problem, hence making the new simulator significantly faster than existing ones. Experimental results show that the simulator has an excellent match with the theory and real turbulence data.
CVMar 5, 2020
Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal InferenceChengxi Li, Stanley H. Chan, Yi-Ting Chen
A significant amount of people die in road accidents due to driver errors. To reduce fatalities, developing intelligent driving systems assisting drivers to identify potential risks is in an urgent need. Risky situations are generally defined based on collision prediction in the existing works. However, collision is only a source of potential risks, and a more generic definition is required. In this work, we propose a novel driver-centric definition of risk, i.e., objects influencing drivers' behavior are risky. A new task called risk object identification is introduced. We formulate the task as the cause-effect problem and present a novel two-stage risk object identification framework based on causal inference with the proposed object-level manipulable driving model. We demonstrate favorable performance on risk object identification compared with strong baselines on the Honda Research Institute Driving Dataset (HDD). Our framework achieves a substantial average performance boost over a strong baseline by 7.5%.
CVSep 20, 2019
Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional NetworksChengxi Li, Yue Meng, Stanley H. Chan et al.
To enable intelligent automated driving systems, a promising strategy is to understand how human drives and interacts with road users in complicated driving situations. In this paper, we propose a 3D-aware egocentric spatial-temporal interaction framework for automated driving applications. Graph convolution networks (GCN) is devised for interaction modeling. We introduce three novel concepts into GCN. First, we decompose egocentric interactions into ego-thing and ego-stuff interaction, modeled by two GCNs. In both GCNs, ego nodes are introduced to encode the interaction between thing objects (e.g., car and pedestrian), and interaction between stuff objects (e.g., lane marking and traffic light). Second, objects' 3D locations are explicitly incorporated into GCN to better model egocentric interactions. Third, to implement ego-stuff interaction in GCN, we propose a MaskAlign operation to extract features for irregular objects. We validate the proposed framework on tactical driver behavior recognition. Extensive experiments are conducted using Honda Research Institute Driving Dataset, the largest dataset with diverse tactical driver behavior annotations. Our framework demonstrates substantial performance boost over baselines on the two experimental settings by 3.9% and 6.0%, respectively. Furthermore, we visualize the learned affinity matrices, which encode ego-thing and ego-stuff interactions, to showcase the proposed framework can capture interactions effectively.
IVMay 17, 2019
Rethinking Atmospheric Turbulence MitigationNicholas Chimitt, Zhiyuan Mao, Guanzhe Hong et al.
State-of-the-art atmospheric turbulence image restoration methods utilize standard image processing tools such as optical flow, lucky region and blind deconvolution to restore the images. While promising results have been reported over the past decade, many of the methods are agnostic to the physical model that generates the distortion. In this paper, we revisit the turbulence restoration problem by analyzing the reference frame generation and the blind deconvolution steps in a typical restoration pipeline. By leveraging tools in large deviation theory, we rigorously prove the minimum number of frames required to generate a reliable reference for both static and dynamic scenes. We discuss how a turbulence agnostic model can lead to potential flaws, and how to configure a simple spatial-temporal non-local weighted averaging method to generate references. For blind deconvolution, we present a new data-driven prior by analyzing the distributions of the point spread functions. We demonstrate how a simple prior can outperform state-of-the-art blind deconvolution methods.
CVMar 23, 2019
Color Filter Arrays for Quanta Image SensorsOmar A. Elgendy, Stanley H. Chan
Quanta image sensor (QIS) is envisioned to be the next generation image sensor after CCD and CMOS. In this paper, we discuss how to design color filter arrays for QIS and other small pixels. Designing color filter arrays for small pixels is challenging because maximizing the light efficiency while suppressing aliasing and crosstalk are conflicting tasks. We present an optimization-based framework which unifies several mainstream color filter array design methodologies. Our method offers greater generality and flexibility. Compared to existing methods, the new framework can simultaneously handle luminance sensitivity, chrominance sensitivity, cross-talk, anti-aliasing, manufacturability and orthogonality. Extensive experimental comparisons demonstrate the effectiveness of the framework.
IVAug 31, 2018
Performance Analysis of Plug-and-Play ADMM: A Graph Signal Processing PerspectiveStanley H. Chan
The Plug-and-Play (PnP) ADMM algorithm is a powerful image restoration framework that allows advanced image denoising priors to be integrated into physical forward models to generate high quality image restoration results. However, despite the enormous number of applications and several theoretical studies trying to prove the convergence by leveraging tools in convex analysis, very little is known about why the algorithm is doing so well. The goal of this paper is to fill the gap by discussing the performance of PnP ADMM. By restricting the denoisers to the class of graph filters under a linearity assumption, or more specifically the symmetric smoothing filters, we offer three contributions: (1) We show conditions under which an equivalent maximum-a-posteriori (MAP) optimization exists, (2) we present a geometric interpretation and show that the performance gain is due to an intrinsic pre-denoising characteristic of the PnP prior, (3) we introduce a new analysis technique via the concept of consensus equilibrium, and provide interpretations to problems involving multiple priors.
CVAug 24, 2018
Automatic Foreground Extraction from Imperfect Backgrounds using Multi-Agent Consensus EquilibriumXiran Wang, Jason Juang, Stanley H. Chan
Extracting accurate foreground objects from a scene is an essential step for many video applications. Traditional background subtraction algorithms can generate coarse estimates, but generating high quality masks requires professional softwares with significant human interventions, e.g., providing trimaps or labeling key frames. We propose an automatic foreground extraction method in applications where a static but imperfect background is available. Examples include filming and surveillance where the background can be captured before the objects enter the scene or after they leave the scene. Our proposed method is very robust and produces significantly better estimates than state-of-the-art background subtraction, video segmentation and alpha matting methods. The key innovation of our method is a novel information fusion technique. The fusion framework allows us to integrate the individual strengths of alpha matting, background subtraction and image denoising to produce an overall better estimate. Such integration is particularly important when handling complex scenes with imperfect background. We show how the framework is developed, and how the individual components are built. Extensive experiments and ablation studies are conducted to evaluate the proposed method.
CVNov 17, 2017
Optimal Combination of Image DenoisersJoon Hee Choi, Omar Elgendy, Stanley H. Chan
Given a set of image denoisers, each having a different denoising capability, is there a provably optimal way of combining these denoisers to produce an overall better result? An answer to this question is fundamental to designing an ensemble of weak estimators for complex scenes. In this paper, we present an optimal combination scheme by leveraging deep neural networks and convex optimization. The proposed framework, called the Consensus Neural Network (CsNet), introduces three new concepts in image denoising: (1) A provably optimal procedure to combine the denoised outputs via convex optimization; (2) A deep neural network to estimate the mean squared error (MSE) of denoised images without needing the ground truths; (3) An image boosting procedure using a deep neural network to improve contrast and to recover lost details of the combined images. Experimental results show that CsNet can consistently improve denoising performance for both deterministic and neural network denoisers.
CVMay 24, 2017
Plug-and-Play Unplugged: Optimization Free Reconstruction using Consensus EquilibriumGregery T. Buzzard, Stanley H. Chan, Suhas Sreehari et al.
Regularized inversion methods for image reconstruction are used widely due to their tractability and ability to combine complex physical sensor models with useful regularity criteria. Such methods motivated the recently developed Plug-and-Play prior method, which provides a framework to use advanced denoising algorithms as regularizers in inversion. However, the need to formulate regularized inversion as the solution to an optimization problem limits the possible regularity conditions and physical sensor models. In this paper, we introduce Consensus Equilibrium (CE), which generalizes regularized inversion to include a much wider variety of both forward components and prior components without the need for either to be expressed with a cost function. CE is based on the solution of a set of equilibrium equations that balance data fit and regularity. In this framework, the problem of MAP estimation in regularized inversion is replaced by the problem of solving these equilibrium equations, which can be approached in multiple ways. The key contribution of CE is to provide a novel framework for fusing multiple heterogeneous models of physical sensors or models learned from data. We describe the derivation of the CE equations and prove that the solution of the CE equations generalizes the standard MAP estimate under appropriate circumstances. We also discuss algorithms for solving the CE equations, including ADMM with a novel form of preconditioning and Newton's method. We give examples to illustrate consensus equilibrium and the convergence properties of these algorithms and demonstrate this method on some toy problems and on a denoising example in which we use an array of convolutional neural network denoisers, none of which is tuned to match the noise level in a noisy image but which in consensus can achieve a better result than any of them individually.
CVApr 12, 2017
Optimal Threshold Design for Quanta Image SensorOmar A. Elgendy, Stanley H. Chan
Quanta Image Sensor (QIS) is a binary imaging device envisioned to be the next generation image sensor after CCD and CMOS. Equipped with a massive number of single photon detectors, the sensor has a threshold $q$ above which the number of arriving photons will trigger a binary response "1", or "0" otherwise. Existing methods in the device literature typically assume that $q=1$ uniformly. We argue that a spatially varying threshold can significantly improve the signal-to-noise ratio of the reconstructed image. In this paper, we present an optimal threshold design framework. We make two contributions. First, we derive a set of oracle results to theoretically inform the maximally achievable performance. We show that the oracle threshold should match exactly with the underlying pixel intensity. Second, we show that around the oracle threshold there exists a set of thresholds that give asymptotically unbiased reconstructions. The asymptotic unbiasedness has a phase transition behavior which allows us to develop a practical threshold update scheme using a bisection method. Experimentally, the new threshold design method achieves better rate of convergence than existing methods.
CVMay 5, 2016
Plug-and-Play ADMM for Image Restoration: Fixed Point Convergence and ApplicationsStanley H. Chan, Xiran Wang, Omar A. Elgendy
Alternating direction method of multiplier (ADMM) is a widely used algorithm for solving constrained optimization problems in image restoration. Among many useful features, one critical feature of the ADMM algorithm is its modular structure which allows one to plug in any off-the-shelf image denoising algorithm for a subproblem in the ADMM algorithm. Because of the plug-in nature, this type of ADMM algorithms is coined the name "Plug-and-Play ADMM". Plug-and-Play ADMM has demonstrated promising empirical results in a number of recent papers. However, it is unclear under what conditions and by using what denoising algorithms would it guarantee convergence. Also, since Plug-and-Play ADMM uses a specific way to split the variables, it is unclear if fast implementation can be made for common Gaussian and Poissonian image restoration problems. In this paper, we propose a Plug-and-Play ADMM algorithm with provable fixed point convergence. We show that for any denoising algorithm satisfying an asymptotic criteria, called bounded denoisers, Plug-and-Play ADMM converges to a fixed point under a continuation scheme. We also present fast implementations for two image restoration problems on super-resolution and single-photon imaging. We compare Plug-and-Play ADMM with state-of-the-art algorithms in each problem type, and demonstrate promising experimental results of the algorithm.
CVFeb 1, 2016
Algorithm-Induced Prior for Image RestorationStanley H. Chan
This paper studies a type of image priors that are constructed implicitly through the alternating direction method of multiplier (ADMM) algorithm, called the algorithm-induced prior. Different from classical image priors which are defined before running the reconstruction algorithm, algorithm-induced priors are defined by the denoising procedure used to replace one of the two modules in the ADMM algorithm. Since such prior is not explicitly defined, analyzing the performance has been difficult in the past. Focusing on the class of symmetric smoothing filters, this paper presents an explicit expression of the prior induced by the ADMM algorithm. The new prior is reminiscent to the conventional graph Laplacian but with stronger reconstruction performance. It can also be shown that the overall reconstruction has an efficient closed-form implementation if the associated symmetric smoothing filter is low rank. The results are validated with experiments on image inpainting.
CVJan 19, 2016
Adaptive Image Denoising by Mixture AdaptationEnming Luo, Stanley H. Chan, Truong Q. Nguyen
We propose an adaptive learning procedure to learn patch-based image priors for image denoising. The new algorithm, called the Expectation-Maximization (EM) adaptation, takes a generic prior learned from a generic external database and adapts it to the noisy image to generate a specific prior. Different from existing methods that combine internal and external statistics in ad-hoc ways, the proposed algorithm is rigorously derived from a Bayesian hyper-prior perspective. There are two contributions of this paper: First, we provide full derivation of the EM adaptation algorithm and demonstrate methods to improve the computational complexity. Second, in the absence of the latent clean image, we show how EM adaptation can be modified based on pre-filtering. Experimental results show that the proposed adaptation algorithm yields consistently better denoising results than the one without adaptation and is superior to several state-of-the-art algorithms.
CVJan 1, 2016
Understanding Symmetric Smoothing Filters: A Gaussian Mixture Model PerspectiveStanley H. Chan, Todd Zickler, Yue M. Lu
Many patch-based image denoising algorithms can be formulated as applying a smoothing filter to the noisy image. Expressed as matrices, the smoothing filters must be row normalized so that each row sums to unity. Surprisingly, if we apply a column normalization before the row normalization, the performance of the smoothing filter can often be significantly improved. Prior works showed that such performance gain is related to the Sinkhorn-Knopp balancing algorithm, an iterative procedure that symmetrizes a row-stochastic matrix to a doubly-stochastic matrix. However, a complete understanding of the performance gain phenomenon is still lacking. In this paper, we study the performance gain phenomenon from a statistical learning perspective. We show that Sinkhorn-Knopp is equivalent to an Expectation-Maximization (EM) algorithm of learning a Gaussian mixture model of the image patches. By establishing the correspondence between the steps of Sinkhorn-Knopp and the EM algorithm, we provide a geometrical interpretation of the symmetrization process. This observation allows us to develop a new denoising algorithm called Gaussian mixture model symmetric smoothing filter (GSF). GSF is an extension of the Sinkhorn-Knopp and is a generalization of the original smoothing filters. Despite its simple formulation, GSF outperforms many existing smoothing filters and has a similar performance compared to several state-of-the-art denoising algorithms.
CVJul 14, 2014
Depth Reconstruction from Sparse Samples: Representation, Algorithm, and SamplingLee-Kang Liu, Stanley H. Chan, Truong Q. Nguyen
The rapid development of 3D technology and computer vision applications have motivated a thrust of methodologies for depth acquisition and estimation. However, most existing hardware and software methods have limited performance due to poor depth precision, low resolution and high computational cost. In this paper, we present a computationally efficient method to recover dense depth maps from sparse measurements. We make three contributions. First, we provide empirical evidence that depth maps can be encoded much more sparsely than natural images by using common dictionaries such as wavelets and contourlets. We also show that a combined wavelet-contourlet dictionary achieves better performance than using either dictionary alone. Second, we propose an alternating direction method of multipliers (ADMM) to achieve fast reconstruction. A multi-scale warm start procedure is proposed to speed up the convergence. Third, we propose a two-stage randomized sampling scheme to optimally choose the sampling locations, thus maximizing the reconstruction performance for any given sampling budget. Experimental results show that the proposed method produces high quality dense depth estimates, and is robust to noisy measurements. Applications to real data in stereo matching are demonstrated.
CVJun 30, 2014
Adaptive Image Denoising by Targeted DatabasesEnming Luo, Stanley H. Chan, Truong Q. Nguyen
We propose a data-dependent denoising procedure to restore noisy images. Different from existing denoising algorithms which search for patches from either the noisy image or a generic database, the new algorithm finds patches from a database that contains only relevant patches. We formulate the denoising problem as an optimal filter design problem and make two contributions. First, we determine the basis function of the denoising filter by solving a group sparsity minimization problem. The optimization formulation generalizes existing denoising algorithms and offers systematic analysis of the performance. Improvement methods are proposed to enhance the patch search process. Second, we determine the spectral coefficients of the denoising filter by considering a localized Bayesian prior. The localized prior leverages the similarity of the targeted database, alleviates the intensive Bayesian computation, and links the new method to the classical linear minimum mean squared error estimation. We demonstrate applications of the proposed method in a variety of scenarios, including text images, multiview images and face images. Experimental results show the superiority of the new algorithm over existing methods.
CVDec 27, 2013
Monte Carlo non local means: Random sampling for large-scale image filteringStanley H. Chan, Todd Zickler, Yue M. Lu
We propose a randomized version of the non-local means (NLM) algorithm for large-scale image filtering. The new algorithm, called Monte Carlo non-local means (MCNLM), speeds up the classical NLM by computing a small subset of image patch distances, which are randomly selected according to a designed sampling pattern. We make two contributions. First, we analyze the performance of the MCNLM algorithm and show that, for large images or large external image databases, the random outcomes of MCNLM are tightly concentrated around the deterministic full NLM result. In particular, our error probability bounds show that, at any given sampling ratio, the probability for MCNLM to have a large deviation from the original NLM solution decays exponentially as the size of the image or database grows. Second, we derive explicit formulas for optimal sampling patterns that minimize the error probability bound by exploiting partial knowledge of the pairwise similarity weights. Numerical experiments show that MCNLM is competitive with other state-of-the-art fast NLM algorithms for single-image denoising. When applied to denoising images using an external database containing ten billion patches, MCNLM returns a randomized solution that is within 0.2 dB of the full NLM solution while reducing the runtime by three orders of magnitude.