IVApr 22Code
Maximum Likelihood Reconstruction for Multi-Look Digital Holography with Markov-Modeled Speckle CorrelationXi Chen, Arian Maleki, Shirin Jalali
Multi-look acquisition is a widely used strategy for reducing speckle noise in coherent imaging systems such as digital holography. By acquiring multiple measurements, speckle can be suppressed through averaging or joint reconstruction, typically under the assumption that speckle realizations across looks are statistically independent. In practice, however, hardware constraints limit measurement diversity, leading to inter-look correlation that degrades the performance of conventional methods. In this work, we study the reconstruction of speckle-free reflectivity from complex-valued multi-look measurements in the presence of correlated speckle. We model the inter-look dependence using a first-order Markov process and derive the corresponding likelihood under a first-order Markov approximation, resulting in a constrained maximum likelihood estimation problem. To solve this problem, we develop an efficient projected gradient descent framework that combines gradient-based updates with implicit regularization via deep image priors, and leverages Monte Carlo approximation and matrix-free operators for scalable computation. Simulation results demonstrate that the proposed approach remains robust under strong inter-look correlation, achieving performance close to the ideal independent-look scenario and consistently outperforming methods that ignore such dependencies. These results highlight the importance of explicitly modeling inter-look correlation and provide a practical framework for multi-look holographic reconstruction under realistic acquisition conditions. Our code is available at: https://github.com/Computational-Imaging-RU/MLE-Holography-Markov.
CEApr 18
Deep Learning-Based Snow Depth Retrieval Using Sentinel-1 Repeat-Pass InSARNayan Yadav, Shadi Oveisgharan, Shirin Jalali
Snow depth plays a central role in seasonal snowpack characterization and the terrestrial water cycle, yet remains challenging to estimate at high spatial resolution. Recent studies have shown that repeat-pass interferometric synthetic aperture radar (InSAR) measurements combined with physics-based models can enable effective snow water equivalent (SWE) retrieval. However, the performance of these methods depends strongly on measurement accuracy and modeling assumptions. Building on the success of InSAR-based approaches, we develop a robust learning-based model that directly learns the relationship between measured InSAR observables and snow depth. The model is trained on a single SnowEx Idaho site and evaluated across independent years and geographically distinct regions. Results demonstrate strong temporal and spatial transferability. In temporal transfer experiments, the proposed approach achieves a Pearson correlation of 0.81 with lidar snow depth, compared to a correlation of approximately 0.47 reported for physics-based Sentinel-1 SWE retrievals over the same site.
CVMar 18
Shot-Aware Frame Sampling for Video UnderstandingMengyu Zhao, Di Fu, Yongyu Xie et al.
Video frame sampling is essential for efficient long-video understanding with Vision-Language Models (VLMs), since dense inputs are costly and often exceed context limits. Yet when only a small number of frames can be retained, existing samplers often fail to balance broad video coverage with brief but critical events, which can lead to unreliable downstream predictions. To address this issue, we present InfoShot, a task-agnostic, shot-aware frame sampler for long-video understanding. InfoShot first partitions a video into semantically consistent shots, and then selects two complementary keyframes from each shot: one to represent the main content and one to capture unusual within-shot changes. This design is guided by an information-theoretic objective that encourages the sampled set to retain high information about both shot structure and sparse within-shot deviations. In this way, it improves the chance of preserving both overall video context and short decision-critical moments without requiring any retraining. To better evaluate such short-lived events, we further introduce SynFlash, a synthetic benchmark with controllable sub-second anomaly patterns and frame-level ground truth, and we also evaluate InfoShot on existing anomaly datasets and general video understanding tasks. Experiments show that InfoShot improves anomaly hit rate and downstream Video-QA accuracy under frame number constraints, while matching or outperforming strong baselines on standard video understanding benchmarks.
MLMay 29, 2025Code
Multilook Coherent Imaging: Theoretical Guarantees and AlgorithmsXi Chen, Soham Jana, Christopher A. Metzler et al.
Multilook coherent imaging is a widely used technique in applications such as digital holography, ultrasound imaging, and synthetic aperture radar. A central challenge in these systems is the presence of multiplicative noise, commonly known as speckle, which degrades image quality. Despite the widespread use of coherent imaging systems, their theoretical foundations remain relatively underexplored. In this paper, we study both the theoretical and algorithmic aspects of likelihood-based approaches for multilook coherent imaging, providing a rigorous framework for analysis and method development. Our theoretical contributions include establishing the first theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the deep image prior hypothesis. Our results capture the dependence of MSE on the number of parameters in the deep image prior, the number of looks, the signal dimension, and the number of measurements per look. On the algorithmic side, we employ projected gradient descent (PGD) as an efficient method for computing the maximum likelihood solution. Furthermore, we introduce two key ideas to enhance the practical performance of PGD. First, we incorporate the Newton-Schulz algorithm to compute matrix inverses within the PGD iterations, significantly reducing computational complexity. Second, we develop a bagging strategy to mitigate projection errors introduced during PGD updates. We demonstrate that combining these techniques with PGD yields state-of-the-art performance. Our code is available at https://github.com/Computational-Imaging-RU/Bagged-DIP-Speckle.
CVFeb 10Code
Monte Carlo Maximum Likelihood Reconstruction for Digital Holography with SpeckleXi Chen, Arian Maleki, Shirin Jalali
In coherent imaging, speckle is statistically modeled as multiplicative noise, posing a fundamental challenge for image reconstruction. While maximum likelihood estimation (MLE) provides a principled framework for speckle mitigation, its application to coherent imaging system such as digital holography with finite apertures is hindered by the prohibitive cost of high-dimensional matrix inversion, especially at high resolutions. This computational burden has prevented the use of MLE-based reconstruction with physically accurate aperture modeling. In this work, we propose a randomized linear algebra approach that enables scalable MLE optimization without explicit matrix inversions in gradient computation. By exploiting the structural properties of sensing matrix and using conjugate gradient for likelihood gradient evaluation, the proposed algorithm supports accurate aperture modeling without the simplifying assumptions commonly imposed for tractability. We term the resulting method projected gradient descent with Monte Carlo estimation (PGD-MC). The proposed PGD-MC framework (i) demonstrates robustness to diverse and physically accurate aperture models, (ii) achieves substantial improvements in reconstruction quality and computational efficiency, and (iii) scales effectively to high-resolution digital holography. Extensive experiments incorporating three representative denoisers as regularization show that PGD-MC provides a flexible and effective MLE-based reconstruction framework for digital holography with finite apertures, consistently outperforming prior Plug-and-Play model-based iterative reconstruction methods in both accuracy and speed. Our code is available at: https://github.com/Computational-Imaging-RU/MC_Maximum_Likelihood_Digital_Holography_Speckle.
IVJun 15, 2025Code
Zero-shot denoising via neural compression: Theoretical and algorithmic frameworkAli Zafari, Xi Chen, Shirin Jalali
Zero-shot denoising aims to denoise observations without access to training samples or clean reference images. This setting is particularly relevant in practical imaging scenarios involving specialized domains such as medical imaging or biology. In this work, we propose the Zero-Shot Neural Compression Denoiser (ZS-NCD), a novel denoising framework based on neural compression. ZS-NCD treats a neural compression network as an untrained model, optimized directly on patches extracted from a single noisy image. The final reconstruction is then obtained by aggregating the outputs of the trained model over overlapping patches. Thanks to the built-in entropy constraints of compression architectures, our method naturally avoids overfitting and does not require manual regularization or early stopping. Through extensive experiments, we show that ZS-NCD achieves state-of-the-art performance among zero-shot denoisers for both Gaussian and Poisson noise, and generalizes well to both natural and non-natural images. Additionally, we provide new finite-sample theoretical results that characterize upper bounds on the achievable reconstruction error of general maximum-likelihood compression-based denoisers. These results further establish the theoretical foundations of compression-based denoising. Our code is available at: github.com/Computational-Imaging-RU/ZS-NCDenoiser.
ITFeb 23, 2024
Bagged Deep Image Prior for Recovering Images in the Presence of Speckle NoiseXi Chen, Zhewen Hou, Christopher A. Metzler et al.
We investigate both the theoretical and algorithmic aspects of likelihood-based methods for recovering a complex-valued signal from multiple sets of measurements, referred to as looks, affected by speckle (multiplicative) noise. Our theoretical contributions include establishing the first existing theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the deep image prior hypothesis. Our theoretical results capture the dependence of MSE upon the number of parameters in the deep image prior, the number of looks, the signal dimension, and the number of measurements per look. On the algorithmic side, we introduce the concept of bagged Deep Image Priors (Bagged-DIP) and integrate them with projected gradient descent. Furthermore, we show how employing Newton-Schulz algorithm for calculating matrix inverses within the iterations of PGD reduces the computational complexity of the algorithm. We will show that this method achieves the state-of-the-art performance.
ITJan 21, 2025
Bayesian Despeckling of Structured SourcesAli Zafari, Shirin Jalali
Speckle noise is a fundamental challenge in coherent imaging systems, significantly degrading image quality. Over the past decades, numerous despeckling algorithms have been developed for applications such as Synthetic Aperture Radar (SAR) and digital holography. In this paper, we aim to establish a theoretically grounded approach to despeckling. We propose a method applicable to general structured stationary stochastic sources. We demonstrate the effectiveness of the proposed method on piecewise constant sources. Additionally, we theoretically derive a lower bound on the despeckling performance for such sources. The proposed depseckler applied to the 1-Markov structured sources achieves better reconstruction performance with no strong simplification of the ground truth signal model or speckle noise.
IVMar 27, 2025
DeCompress: Denoising via Neural CompressionAli Zafari, Xi Chen, Shirin Jalali
Learning-based denoising algorithms achieve state-of-the-art performance across various denoising tasks. However, training such models relies on access to large training datasets consisting of clean and noisy image pairs. On the other hand, in many imaging applications, such as microscopy, collecting ground truth images is often infeasible. To address this challenge, researchers have recently developed algorithms that can be trained without requiring access to ground truth data. However, training such models remains computationally challenging and still requires access to large noisy training samples. In this work, inspired by compression-based denoising and recent advances in neural compression, we propose a new compression-based denoising algorithm, which we name DeCompress, that i) does not require access to ground truth images, ii) does not require access to large training dataset - only a single noisy image is sufficient, iii) is robust to overfitting, and iv) achieves superior performance compared with zero-shot or unsupervised learning-based denoisers.
IVJan 8, 2025
Comprehensive Examination of Unrolled Networks for Solving Linear Inverse ProblemsEric Chen, Xi Chen, Arian Maleki et al.
Unrolled networks have become prevalent in various computer vision and imaging tasks. Although they have demonstrated remarkable efficacy in solving specific computer vision and computational imaging tasks, their adaptation to other applications presents considerable challenges. This is primarily due to the multitude of design decisions that practitioners working on new applications must navigate, each potentially affecting the network's overall performance. These decisions include selecting the optimization algorithm, defining the loss function, and determining the number of convolutional layers, among others. Compounding the issue, evaluating each design choice requires time-consuming simulations to train, fine-tune the neural network, and optimize for its performance. As a result, the process of exploring multiple options and identifying the optimal configuration becomes time-consuming and computationally demanding. The main objectives of this paper are (1) to unify some ideas and methodologies used in unrolled networks to reduce the number of design choices a user has to make, and (2) to report a comprehensive ablation study to discuss the impact of each of the choices involved in designing unrolled networks and present practical recommendations based on our findings. We anticipate that this study will help scientists and engineers design unrolled networks for their applications and diagnose problems within their networks efficiently.
CVJun 6, 2024
Untrained Neural Nets for Snapshot Compressive Imaging: Theory and AlgorithmsMengyu Zhao, Xi Chen, Xin Yuan et al.
Snapshot compressive imaging (SCI) recovers high-dimensional (3D) data cubes from a single 2D measurement, enabling diverse applications like video and hyperspectral imaging to go beyond standard techniques in terms of acquisition speed and efficiency. In this paper, we focus on SCI recovery algorithms that employ untrained neural networks (UNNs), such as deep image prior (DIP), to model source structure. Such UNN-based methods are appealing as they have the potential of avoiding the computationally intensive retraining required for different source models and different measurement scenarios. We first develop a theoretical framework for characterizing the performance of such UNN-based methods. The theoretical framework, on the one hand, enables us to optimize the parameters of data-modulating masks, and on the other hand, provides a fundamental connection between the number of data frames that can be recovered from a single measurement to the parameters of the untrained NN. We also employ the recently proposed bagged-deep-image-prior (bagged-DIP) idea to develop SCI Bagged Deep Video Prior (SCI-BDVP) algorithms that address the common challenges faced by standard UNN solutions. Our experimental results show that in video SCI our proposed solution achieves state-of-the-art among UNN methods, and in the case of noisy measurements, it even outperforms supervised solutions.
LGFeb 15, 2019
Efficient Deep Learning of GMMsShirin Jalali, Carl Nuzman, Iraj Saniee
We show that a collection of Gaussian mixture models (GMMs) in $R^{n}$ can be optimally classified using $O(n)$ neurons in a neural network with two hidden layers (deep neural network), whereas in contrast, a neural network with a single hidden layer (shallow neural network) would require at least $O(\exp(n))$ neurons or possibly exponentially large coefficients. Given the universality of the Gaussian distribution in the feature spaces of data, e.g., in speech, image and text, our result sheds light on the observed efficiency of deep neural networks in practical classification problems.
LGDec 19, 2017
Linear Time Clustering for High Dimensional Mixtures of Gaussian CloudsDan Kushnir, Shirin Jalali, Iraj Saniee
Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. While state-of-the-art work on learning Gaussian mixture models has focused primarily on improving separation bounds and their generalization to arbitrary classes of mixture models, less emphasis has been paid to practical computational efficiency of the proposed solutions. In this paper, we propose a novel and highly efficient clustering algorithm for $n$ points drawn from a mixture of two arbitrary Gaussian distributions in $\mathbb{R}^p$. The algorithm involves performing random 1-dimensional projections until a direction is found that yields a user-specified clustering error $e$. For a 1-dimensional separation parameter $γ$ satisfying $γ=Q^{-1}(e)$, the expected number of such projections is shown to be bounded by $o(\ln p)$, when $γ$ satisfies $γ\leq c\sqrt{\ln{\ln{p}}}$, with $c$ as the separability parameter of the two Gaussians in $\mathbb{R}^p$. Consequently, the expected overall running time of the algorithm is linear in $n$ and quasi-linear in $p$ at $o(\ln{p})O(np)$, and the sample complexity is independent of $p$. This result stands in contrast to prior works which provide polynomial, with at-best quadratic, running time in $p$ and $n$. We show that our bound on the expected number of 1-dimensional projections extends to the case of three or more Gaussian components, and we present a generalization of our results to mixture distributions beyond the Gaussian model.