CVJun 27, 2022
Deep Optical Coding Design in Computational ImagingHenry Arguello, Jorge Bacca, Hasindu Kariyawasam et al. · cmu
Computational optical imaging (COI) systems leverage optical coding elements (CE) in their setups to encode a high-dimensional scene in a single or multiple snapshots and decode it by using computational algorithms. The performance of COI systems highly depends on the design of its main components: the CE pattern and the computational method used to perform a given task. Conventional approaches rely on random patterns or analytical designs to set the distribution of the CE. However, the available data and algorithm capabilities of deep neural networks (DNNs) have opened a new horizon in CE data-driven designs that jointly consider the optical encoder and computational decoder. Specifically, by modeling the COI measurements through a fully differentiable image formation model that considers the physics-based propagation of light and its interaction with the CEs, the parameters that define the CE and the computational decoder can be optimized in an end-to-end (E2E) manner. Moreover, by optimizing just CEs in the same framework, inference tasks can be performed from pure optics. This work surveys the recent advances on CE data-driven design and provides guidelines on how to parametrize different optical elements to include them in the E2E framework. Since the E2E framework can handle different inference applications by changing the loss function and the DNN, we present low-level tasks such as spectral imaging reconstruction or high-level tasks such as pose estimation with privacy preserving enhanced by using optimal task-based optical architectures. Finally, we illustrate classification and 3D object recognition applications performed at the speed of the light using all-optics DNN.
CVNov 5, 2022
Mixture-Net: Low-Rank Deep Image Prior Inspired by Mixture Models for Spectral Image RecoveryTatiana Gelvez-Barrera, Jorge Bacca, Henry Arguello
This paper proposes a non-data-driven deep neural network for spectral image recovery problems such as denoising, single hyperspectral image super-resolution, and compressive spectral imaging reconstruction. Unlike previous methods, the proposed approach, dubbed Mixture-Net, implicitly learns the prior information through the network. Mixture-Net consists of a deep generative model whose layers are inspired by the linear and non-linear low-rank mixture models, where the recovered image is composed of a weighted sum between the linear and non-linear decomposition. Mixture-Net also provides a low-rank decomposition interpreted as the spectral image abundances and endmembers, helpful in achieving remote sensing tasks without running additional routines. The experiments show the MixtureNet effectiveness outperforming state-of-the-art methods in recovery quality with the advantage of architecture interpretability.
IVMay 16, 2022
JR2net: A Joint Non-Linear Representation and Recovery Network for Compressive Spectral ImagingBrayan Monroy, Jorge Bacca, Henry Arguello
Deep learning models are state-of-the-art in compressive spectral imaging (CSI) recovery. These methods use a deep neural network (DNN) as an image generator to learn non-linear mapping from compressed measurements to the spectral image. For instance, the deep spectral prior approach uses a convolutional autoencoder network (CAE) in the optimization algorithm to recover the spectral image by using a non-linear representation. However, the CAE training is detached from the recovery problem, which does not guarantee optimal representation of the spectral images for the CSI problem. This work proposes a joint non-linear representation and recovery network (JR2net), linking the representation and recovery task into a single optimization problem. JR2net consists of an optimization-inspired network following an ADMM formulation that learns a non-linear low-dimensional representation and simultaneously performs the spectral image recovery, trained via the end-to-end approach. Experimental results show the superiority of the proposed method with improvements up to 2.57 dB in PSNR and performance around 2000 times faster than state-of-the-art methods.
LGMay 27, 2022
Deep Coding Patterns Design for Compressive Near-Infrared Spectral ClassificationJorge Bacca, Alejandra Hernandez-Rojas, Henry Arguello
Compressive spectral imaging (CSI) has emerged as an attractive compression and sensing technique, primarily to sense spectral regions where traditional systems result in highly costly such as in the near-infrared spectrum. Recently, it has been shown that spectral classification can be performed directly in the compressive domain, considering the amount of spectral information embedded in the measurements, skipping the reconstruction step. Consequently, the classification quality directly depends on the set of coding patterns employed in the sensing step. Therefore, this work proposes an end-to-end approach to jointly design the coding patterns used in CSI and the network parameters to perform spectral classification directly from the embedded near-infrared compressive measurements. Extensive simulation on the three-dimensional coded aperture snapshot spectral imaging (3D-CASSI) system validates that the proposed design outperforms traditional and random design in up to 10% of classification accuracy.
IVMay 24, 2022
D$^\text{2}$UF: Deep Coded Aperture Design and Unrolling Algorithm for Compressive Spectral Image FusionRoman Jacome, Jorge Bacca, Henry Arguello
Compressive spectral imaging (CSI) has attracted significant attention since it employs synthetic apertures to codify spatial and spectral information, sensing only 2D projections of the 3D spectral image. However, these optical architectures suffer from a trade-off between the spatial and spectral resolution of the reconstructed image due to technology limitations. To overcome this issue, compressive spectral image fusion (CSIF) employs the projected measurements of two CSI architectures with different resolutions to estimate a high-spatial high-spectral resolution. This work presents the fusion of the compressive measurements of a low-spatial high-spectral resolution coded aperture snapshot spectral imager (CASSI) architecture and a high-spatial low-spectral resolution multispectral color filter array (MCFA) system. Unlike previous CSIF works, this paper proposes joint optimization of the sensing architectures and a reconstruction network in an end-to-end (E2E) manner. The trainable optical parameters are the coded aperture (CA) in the CASSI and the colored coded aperture in the MCFA system, employing a sigmoid activation function and regularization function to encourage binary values on the trainable variables for an implementation purpose. Additionally, an unrolling-based network inspired by the alternating direction method of multipliers (ADMM) optimization is formulated to address the reconstruction step and the acquisition systems design jointly. Finally, a spatial-spectral inspired loss function is employed at the end of each unrolling layer to increase the convergence of the unrolling network. The proposed method outperforms previous CSIF methods, and experimental results validate the method with real measurements.
DSSep 4, 2024
Hadamard Row-Wise Generation AlgorithmBrayan Monroy, Jorge Bacca
In this paper, we introduce an efficient algorithm for generating specific Hadamard rows, addressing the memory demands of pre-computing the entire matrix. Leveraging Sylvester's recursive construction, our method generates the required $i$-th row on demand, significantly reducing computational resources. The algorithm uses the Kronecker product to construct the desired row from the binary representation of the index, without creating the full matrix. This approach is particularly useful for single-pixel imaging systems that need only one row at a time.
IVMar 26
Learning to Recorrupt: Noise Distribution Agnostic Self-Supervised Image DenoisingBrayan Monroy, Jorge Bacca, Julián Tachella
Self-supervised image denoising methods have traditionally relied on either architectural constraints or specialized loss functions that require prior knowledge of the noise distribution to avoid the trivial identity mapping. Among these, approaches such as Noisier2Noise or Recorrupted2Recorrupted, create training pairs by adding synthetic noise to the noisy images. While effective, these recorruption-based approaches require precise knowledge of the noise distribution, which is often not available. We present Learning to Recorrupt (L2R), a noise distribution-agnostic denoising technique that eliminates the need for knowledge of the noise distribution. Our method introduces a learnable monotonic neural network that learns the recorruption process through a min-max saddle-point objective. The proposed method achieves state-of-the-art performance across unconventional and heavy-tailed noise distributions, such as log-gamma, Laplace, and spatially correlated noise, as well as signal-dependent noise models such as Poisson-Gaussian noise.
IVJan 30
Scale Equivariance Regularization and Feature Lifting in High Dynamic Range Modulo ImagingBrayan Monroy, Jorge Bacca
Modulo imaging enables high dynamic range (HDR) acquisition by cyclically wrapping saturated intensities, but accurate reconstruction remains challenging due to ambiguities between natural image edges and artificial wrap discontinuities. This work proposes a learning-based HDR restoration framework that incorporates two key strategies: (i) a scale-equivariant regularization that enforces consistency under exposure variations, and (ii) a feature lifting input design combining the raw modulo image, wrapped finite differences, and a closed-form initialization. Together, these components enhance the network's ability to distinguish true structure from wrapping artifacts, yielding state-of-the-art performance across perceptual and linear HDR quality metrics.
LGMay 21, 2025
Projection-Based Correction for Enhancing Deep Inverse NetworksJorge Bacca
Deep learning-based models have demonstrated remarkable success in solving illposed inverse problems; however, many fail to strictly adhere to the physical constraints imposed by the measurement process. In this work, we introduce a projection-based correction method to enhance the inference of deep inverse networks by ensuring consistency with the forward model. Specifically, given an initial estimate from a learned reconstruction network, we apply a projection step that constrains the solution to lie within the valid solution space of the inverse problem. We theoretically demonstrate that if the recovery model is a well-trained deep inverse network, the solution can be decomposed into range-space and null-space components, where the projection-based correction reduces to an identity transformation. Extensive simulations and experiments validate the proposed method, demonstrating improved reconstruction accuracy across diverse inverse problems and deep network architectures.
CVApr 11, 2025
High Dynamic Range Modulo Imaging for Robust Object Detection in Autonomous DrivingKebin Contreras, Brayan Monroy, Jorge Bacca
Object detection precision is crucial for ensuring the safety and efficacy of autonomous driving systems. The quality of acquired images directly influences the ability of autonomous driving systems to correctly recognize and respond to other vehicles, pedestrians, and obstacles in real-time. However, real environments present extreme variations in lighting, causing saturation problems and resulting in the loss of crucial details for detection. Traditionally, High Dynamic Range (HDR) images have been preferred for their ability to capture a broad spectrum of light intensities, but the need for multiple captures to construct HDR images is inefficient for real-time applications in autonomous vehicles. To address these issues, this work introduces the use of modulo sensors for robust object detection. The modulo sensor allows pixels to `reset/wrap' upon reaching saturation level by acquiring an irradiance encoding image which can then be recovered using unwrapping algorithms. The applied reconstruction techniques enable HDR recovery of color intensity and image details, ensuring better visual quality even under extreme lighting conditions at the cost of extra time. Experiments with the YOLOv10 model demonstrate that images processed using modulo images achieve performance comparable to HDR images and significantly surpass saturated images in terms of object detection accuracy. Moreover, the proposed modulo imaging step combined with HDR image reconstruction is shorter than the time required for conventional HDR image acquisition.
IVApr 5, 2025
Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision ApplicationsBrayan Monroy, Kebin Contreras, Jorge Bacca
High dynamic range (HDR) imaging is vital for capturing the full range of light tones in scenes, essential for computer vision tasks such as autonomous driving. Standard commercial imaging systems face limitations in capacity for well depth, and quantization precision, hindering their HDR capabilities. Modulo imaging, based on unlimited sampling (US) theory, addresses these limitations by using a modulo analog-to-digital approach that resets signals upon saturation, enabling estimation of pixel resets through neighboring pixel intensities. Despite the effectiveness of (US) algorithms in one-dimensional signals, their optimization problem for two-dimensional signals remains unclear. This work formulates the US framework as an autoregressive $\ell_2$ phase unwrapping problem, providing computationally efficient solutions in the discrete cosine domain jointly with a stride removal algorithm also based on spatial differences. By leveraging higher-order finite differences for two-dimensional images, our approach enhances HDR image reconstruction from modulo images, demonstrating its efficacy in improving object detection in autonomous driving scenes without retraining.
CVMay 3, 2024
Designed Dithering Sign Activation for Binary Neural NetworksBrayan Monroy, Juan Estupiñan, Tatiana Gelvez-Barrera et al.
Binary Neural Networks emerged as a cost-effective and energy-efficient solution for computer vision tasks by binarizing either network weights or activations. However, common binary activations, such as the Sign activation function, abruptly binarize the values with a single threshold, losing fine-grained details in the feature outputs. This work proposes an activation that applies multiple thresholds following dithering principles, shifting the Sign activation function for each pixel according to a spatially periodic threshold kernel. Unlike literature methods, the shifting is defined jointly for a set of adjacent pixels, taking advantage of spatial correlations. Experiments over the classification task demonstrate the effectiveness of the designed dithering Sign activation function as an alternative activation for binary neural networks, without increasing the computational cost. Further, DeSign balances the preservation of details with the efficiency of binary operations.
CVOct 6, 2025
See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language ModelsKebin Contreras, Luis Toscano-Palomino, Mauro Dalla Mura et al.
Recovering the past from present observations is an intriguing challenge with potential applications in forensics and scene analysis. Thermal imaging, operating in the infrared range, provides access to otherwise invisible information. Since humans are typically warmer (37 C -98.6 F) than their surroundings, interactions such as sitting, touching, or leaning leave residual heat traces. These fading imprints serve as passive temporal codes, allowing for the inference of recent events that exceed the capabilities of RGB cameras. This work proposes a time-reversed reconstruction framework that uses paired RGB and thermal images to recover scene states from a few seconds earlier. The proposed approach couples Visual-Language Models (VLMs) with a constrained diffusion process, where one VLM generates scene descriptions and another guides image reconstruction, ensuring semantic and structural consistency. The method is evaluated in three controlled scenarios, demonstrating the feasibility of reconstructing plausible past frames up to 120 seconds earlier, providing a first step toward time-reversed imaging from thermal traces.