Karen Egiazarian

CV
h-index7
15papers
380citations
Novelty49%
AI Score44

15 Papers

OPTICSMar 3, 2022
Unfolding-Aided Bootstrapped Phase Retrieval in Optical Imaging

Samuel Pinilla, Kumar Vijay Mishra, Igor Shevkunov et al.

Phase retrieval in optical imaging refers to the recovery of a complex signal from phaseless data acquired in the form of its diffraction patterns. These patterns are acquired through a system with a coherent light source that employs a diffractive optical element (DOE) to modulate the scene resulting in coded diffraction patterns at the sensor. Recently, the hybrid approach of model-driven network or deep unfolding has emerged as an effective alternative to conventional model-based and learning-based phase retrieval techniques because it allows for bounding the complexity of algorithms while also retaining their efficacy. Additionally, such hybrid approaches have shown promise in improving the design of DOEs that follow theoretical uniqueness conditions. There are opportunities to exploit novel experimental setups and resolve even more complex DOE phase retrieval applications. This paper presents an overview of algorithms and applications of deep unfolding for bootstrapped - regardless of near, middle, and far zones - phase retrieval.

CVApr 14, 2022
Residual Swin Transformer Channel Attention Network for Image Demosaicing

Wenzhu Xing, Karen Egiazarian

Image demosaicing is problem of interpolating full- resolution color images from raw sensor (color filter array) data. During last decade, deep neural networks have been widely used in image restoration, and in particular, in demosaicing, attaining significant performance improvement. In recent years, vision transformers have been designed and successfully used in various computer vision applications. One of the recent methods of image restoration based on a Swin Transformer (ST), SwinIR, demonstrates state-of-the-art performance with a smaller number of parameters than neural network-based methods. Inspired by the success of SwinIR, we propose in this paper a novel Swin Transformer-based network for image demosaicing, called RSTCANet. To extract image features, RSTCANet stacks several residual Swin Transformer Channel Attention blocks (RSTCAB), introducing the channel attention for each two successive ST blocks. Extensive experiments demonstrate that RSTCANet out- performs state-of-the-art image demosaicing methods, and has a smaller number of parameters.

CVMar 4
DM-CFO: A Diffusion Model for Compositional 3D Tooth Generation with Collision-Free Optimization

Yan Tian, Pengcheng Xue, Weiping Ding et al.

The automatic design of a 3D tooth model plays a crucial role in dental digitization. However, current approaches face challenges in compositional 3D tooth generation because both the layouts and shapes of missing teeth need to be optimized.In addition, collision conflicts are often omitted in 3D Gaussian-based compositional 3D generation, where objects may intersect with each other due to the absence of explicit geometric information on the object surfaces. Motivated by graph generation through diffusion models and collision detection using 3D Gaussians, we propose an approach named DM-CFO for compositional tooth generation, where the layout of missing teeth is progressively restored during the denoising phase under both text and graph constraints. Then, the Gaussian parameters of each layout-guided tooth and the entire jaw are alternately updated using score distillation sampling (SDS). Furthermore, a regularization term based on the distances between the 3D Gaussians of neighboring teeth and the anchor tooth is introduced to penalize tooth intersections. Experimental results on three tooth-design datasets demonstrate that our approach significantly improves the multiview consistency and realism of the generated teeth compared with existing methods. Project page: https://amateurc.github.io/CF-3DTeeth/.

65.1CVMar 30
SVGS: Single-View to 3D Object Editing via Gaussian Splatting

Pengcheng Xue, Yan Tian, Qiutao Song et al.

Text-driven 3D scene editing has attracted considerable interest due to its convenience and user-friendliness. However, methods that rely on implicit 3D representations, such as Neural Radiance Fields (NeRF), while effective in rendering complex scenes, are hindered by slow processing speeds and limited control over specific regions of the scene. Moreover, existing approaches, including Instruct-NeRF2NeRF and GaussianEditor, which utilize multi-view editing strategies, frequently produce inconsistent results across different views when executing text instructions. This inconsistency can adversely affect the overall performance of the model, complicating the task of balancing the consistency of editing results with editing efficiency. To address these challenges, we propose a novel method termed Single-View to 3D Object Editing via Gaussian Splatting (SVGS), which is a single-view text-driven editing technique based on 3D Gaussian Splatting (3DGS). Specifically, in response to text instructions, we introduce a single-view editing strategy grounded in multi-view diffusion models, which reconstructs 3D scenes by leveraging only those views that yield consistent editing results. Additionally, we employ sparse 3D Gaussian Splatting as the 3D representation, which significantly enhances editing efficiency. We conducted a comparative analysis of SVGS against existing baseline methods across various scene settings, and the results indicate that SVGS outperforms its counterparts in both editing capability and processing speed, representing a significant advancement in 3D editing technology. For further details, please visit our project page at: https://amateurc.github.io/svgs.github.io.

CVFeb 14, 2024
Towards Realistic Landmark-Guided Facial Video Inpainting Based on GANs

Fatemeh Ghorbani Lohesara, Karen Egiazarian, Sebastian Knorr

Facial video inpainting plays a crucial role in a wide range of applications, including but not limited to the removal of obstructions in video conferencing and telemedicine, enhancement of facial expression analysis, privacy protection, integration of graphical overlays, and virtual makeup. This domain presents serious challenges due to the intricate nature of facial features and the inherent human familiarity with faces, heightening the need for accurate and persuasive completions. In addressing challenges specifically related to occlusion removal in this context, our focus is on the progressive task of generating complete images from facial data covered by masks, ensuring both spatial and temporal coherence. Our study introduces a network designed for expression-based video inpainting, employing generative adversarial networks (GANs) to handle static and moving occlusions across all frames. By utilizing facial landmarks and an occlusion-free reference image, our model maintains the user's identity consistently across frames. We further enhance emotional preservation through a customized facial expression recognition (FER) loss function, ensuring detailed inpainted outputs. Our proposed framework exhibits proficiency in eliminating occlusions from facial videos in an adaptive form, whether appearing static or dynamic on the frames, while providing realistic and coherent results.

CVJan 25, 2024
Expression-aware video inpainting for HMD removal in XR applications

Fatemeh Ghorbani Lohesara, Karen Egiazarian, Sebastian Knorr

Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in creating an immersive user experience. In this study, we propose a new network for expression-aware video inpainting for HMD removal (EVI-HRnet) based on generative adversarial networks (GANs). Our model effectively fills in missing information with regard to facial landmarks and a single occlusion-free reference image of the user. The framework and its components ensure the preservation of the user's identity across frames using the reference frame. To further improve the level of realism of the inpainted output, we introduce a novel facial expression recognition (FER) loss function for emotion preservation. Our results demonstrate the remarkable capability of the proposed framework to remove HMDs from facial videos while maintaining the subject's facial expression and identity. Moreover, the outputs exhibit temporal consistency along the inpainted frames. This lightweight framework presents a practical approach for HMD occlusion removal, with the potential to enhance various collaborative XR applications without the need for additional hardware.

IVSep 24, 2021
Learning-based Noise Component Map Estimation for Image Denoising

Sheyda Ghanbaralizadeh Bahnemiri, Mykola Ponomarenko, Karen Egiazarian

A problem of image denoising when images are corrupted by a non-stationary noise is considered in this paper. Since in practice no a priori information on noise is available, noise statistics should be pre-estimated for image denoising. In this paper, deep convolutional neural network (CNN) based method for estimation of a map of local, patch-wise, standard deviations of noise (so-called sigma-map) is proposed. It achieves the state-of-the-art performance in accuracy of estimation of sigma-map for the case of non-stationary noise, as well as estimation of noise variance for the case of additive white Gaussian noise. Extensive experiments on image denoising using estimated sigma-maps demonstrate that our method outperforms recent CNN-based blind image denoising methods by up to 6 dB in PSNR, as well as other state-of-the-art methods based on sigma-map estimation by up to 0.5 dB, providing same time better usage flexibility. Comparison with the ideal case, when denoising is applied using ground-truth sigma-map, shows that a difference of corresponding PSNR values for most of noise levels is within 0.1-0.2 dB and does not exceeds 0.6 dB.

IVMar 2, 2020
Flashlight CNN Image Denoising

Pham Huu Thanh Binh, Cristóvão Cruz, Karen Egiazarian

This paper proposes a learning-based denoising method called FlashLight CNN (FLCNN) that implements a deep neural network for image denoising. The proposed approach is based on deep residual networks and inception networks and it is able to leverage many more parameters than residual networks alone for denoising grayscale images corrupted by additive white Gaussian noise (AWGN). FlashLight CNN demonstrates state of the art performance when compared quantitatively and visually with the current state of the art image denoising methods.

OCOct 22, 2019
The Practicality of Stochastic Optimization in Imaging Inverse Problems

Junqi Tang, Karen Egiazarian, Mohammad Golbabaee et al.

In this work we investigate the practicality of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems. Such algorithms have been shown in the machine learning literature to have optimal complexities in theory, and provide great improvement empirically over the deterministic gradient methods. Surprisingly, in some tasks such as image deblurring, many of such methods fail to converge faster than the accelerated deterministic gradient methods, even in terms of epoch counts. We investigate this phenomenon and propose a theory-inspired mechanism for the practitioners to efficiently characterize whether it is beneficial for an inverse problem to be solved by stochastic optimization techniques or not. Using standard tools in numerical linear algebra, we derive conditions on the spectral structure of the inverse problem for being a suitable application of stochastic gradient methods. Particularly, we show that, for an imaging inverse problem, if and only if its Hessain matrix has a fast-decaying eigenspectrum, then the stochastic gradient methods can be more advantageous than deterministic methods for solving such a problem. Our results also provide guidance on choosing appropriately the partition minibatch schemes, showing that a good minibatch scheme typically has relatively low correlation within each of the minibatches. Finally, we propose an accelerated primal-dual SGD algorithm in order to tackle another key bottleneck of stochastic optimization which is the heavy computation of proximal operators. The proposed method has fast convergence rate in practice, and is able to efficiently handle non-smooth regularization terms which are coupled with linear operators.

NAOct 4, 2019
Hyperspectral holography and spectroscopy: computational features of inverse discrete cosine transform

Vladimir Katkovnik, Igor Shevkunov, Karen Egiazarian

Broadband hyperspectral digital holography and Fourier transform spectroscopy are important instruments in various science and application fields. In the digital hyperspectral holography and spectroscopy the variable of interest are obtained as inverse discrete cosine transforms of observed diffractive intensity patterns. In these notes, we provide a variety of algorithms for the inverse cosine transform with the proofs of perfect spectrum reconstruction, as well as we discuss and illustrate some nontrivial features of these algorithms.

IVMar 6, 2018
Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising

Cristóvão Cruz, Alessandro Foi, Vladimir Katkovnik et al.

We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease at every iteration of our framework, while their output is regularized by a nonlocal prior implicit within the NLF. Unlike complicated neural networks that embed the nonlocality prior within the layers of the network, our framework is modular, it uses standard pre-trained CNNs together with standard nonlocal filters. An instance of the proposed framework, called NN3D, is evaluated over large grayscale image datasets showing state-of-the-art performance.

CVNov 29, 2017
Blind estimation of white Gaussian noise variance in highly textured images

Mykola Ponomarenko, Nikolay Gapon, Viacheslav Voronin et al.

In the paper, a new method of blind estimation of noise variance in a single highly textured image is proposed. An input image is divided into 8x8 blocks and discrete cosine transform (DCT) is performed for each block. A part of 64 DCT coefficients with lowest energy calculated through all blocks is selected for further analysis. For the DCT coefficients, a robust estimate of noise variance is calculated. Corresponding to the obtained estimate, a part of blocks having very large values of local variance calculated only for the selected DCT coefficients are excluded from the further analysis. These two steps (estimation of noise variance and exclusion of blocks) are iteratively repeated three times. For the verification of the proposed method, a new noise-free test image database TAMPERE17 consisting of many highly textured images is designed. It is shown for this database and different values of noise variance from the set {25, 49, 100, 225}, that the proposed method provides approximately two times lower estimation root mean square error than other methods.

CVNov 2, 2017
Statistical evaluation of visual quality metrics for image denoising

Karen Egiazarian, Mykola Ponomarenko, Vladimir Lukin et al.

This paper studies the problem of full reference visual quality assessment of denoised images with a special emphasis on images with low contrast and noise-like texture. Denoising of such images together with noise removal often results in image details loss or smoothing. A new test image database, FLT, containing 75 noise-free "reference" images and 300 filtered ("distorted") images is developed. Each reference image, corrupted by an additive white Gaussian noise, is denoised by the BM3D filter with four different values of threshold parameter (four levels of noise suppression). After carrying out a perceptual quality assessment of distorted images, the mean opinion scores (MOS) are obtained and compared with the values of known full reference quality metrics. As a result, the Spearman Rank Order Correlation Coefficient (SROCC) between PSNR values and MOS has a value close to zero, and SROCC between values of known full-reference image visual quality metrics and MOS does not exceed 0.82 (which is reached by a new visual quality metric proposed in this paper). The FLT dataset is more complex than earlier datasets used for assessment of visual quality for image denoising. Thus, it can be effectively used to design new image visual quality metrics for image denoising.

CVNov 1, 2017
Complex-valued image denosing based on group-wise complex-domain sparsity

Vladimir Katkovnik, Mykola Ponomarenko, Karen Egiazarian

Phase imaging and wavefront reconstruction from noisy observations of complex exponent is a topic of this paper. It is a highly non-linear problem because the exponent is a 2π-periodic function of phase. The reconstruction of phase and amplitude is difficult. Even with an additive Gaussian noise in observations distributions of noisy components in phase and amplitude are signal dependent and non-Gaussian. Additional difficulties follow from a prior unknown correlation of phase and amplitude in real life scenarios. In this paper, we propose a new class of non-iterative and iterative complex domain filters based on group-wise sparsity in complex domain. This sparsity is based on the techniques implemented in Block-Matching 3D filtering (BM3D) and 3D/4D High-Order Singular Decomposition (HOSVD) exploited for spectrum design, analysis and filtering. The introduced algorithms are a generalization of the ideas used in the CD-BM3D algorithms presented in our previous publications. The algorithms are implemented as a MATLAB Toolbox. The efficiency of the algorithms is demonstrated by simulation tests.

CVApr 13, 2017
Single Image Super-Resolution based on Wiener Filter in Similarity Domain

Cristóvão Cruz, Rakesh Mehta, Vladimir Katkovnik et al.

Single image super resolution (SISR) is an ill-posed problem aiming at estimating a plausible high resolution (HR) image from a single low resolution (LR) image. Current state-of-the-art SISR methods are patch-based. They use either external data or internal self-similarity to learn a prior for a HR image. External data based methods utilize large number of patches from the training data, while self-similarity based approaches leverage one or more similar patches from the input image. In this paper we propose a self-similarity based approach that is able to use large groups of similar patches extracted from the input image to solve the SISR problem. We introduce a novel prior leading to collaborative filtering of patch groups in 1D similarity domain and couple it with an iterative back-projection framework. The performance of the proposed algorithm is evaluated on a number of SISR benchmark datasets. Without using any external data, the proposed approach outperforms the current non-CNN based methods on the tested datasets for various scaling factors. On certain datasets, the gain is over 1 dB, when compared to the recent method A+. For high sampling rate (x4) the proposed method performs similarly to very recent state-of-the-art deep convolutional network based approaches.