78.8LGMay 26Code
Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield PerspectiveHyunmin Cho, Woo Kyoung Han, Kyong Hwan Jin
We characterize the pre-softmax attention matrix $\mathbf{QK^\top}$ in transformers as an associative memory matrix encoding pairwise associations between input features. By decomposing this matrix into its symmetric and skew-symmetric parts, we interpret the symmetric component as governing the structure of the energy landscape, and the skew-symmetric component as driving circulation on that landscape. Leveraging the energy formulation induced by the symmetric component, we derive Hopfield-style stability measures that quantify the stability of retrieved features. We observe meaningful correlations between Hopfield-style stability measures and the fidelity-diversity trade-offs in generation. Finally, we propose a controllable knob to modulate this trade-off by modifying the circulation of the underlying dynamics. Code is available at our GitHub (https://github.com/hyeon-cho/Attention-Symmetric-Decomposition).
IVSep 21, 2024Code
BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical FlowEungGu Kang, Byeonghun Lee, Sunghoon Im et al.
Multi frame super-resolution(MFSR) achieves higher performance than single image super-resolution (SISR), because MFSR leverages abundant information from multiple frames. Recent MFSR approaches adapt the deformable convolution network (DCN) to align the frames. However, the existing MFSR suffers from misalignments between the reference and source frames due to the limitations of DCN, such as small receptive fields and the predefined number of kernels. From these problems, existing MFSR approaches struggle to represent high-frequency information. To this end, we propose Deep Burst Multi-scale SR using Fourier Space with Optical Flow (BurstM). The proposed method estimates the optical flow offset for accurate alignment and predicts the continuous Fourier coefficient of each frame for representing high-frequency textures. In addition, we have enhanced the network flexibility by supporting various super-resolution (SR) scale factors with the unimodel. We demonstrate that our method has the highest performance and flexibility than the existing MFSR methods. Our source code is available at https://github.com/Egkang-Luis/burstm
CVJul 5, 2022
Learning Local Implicit Fourier Representation for Image WarpingJaewon Lee, Kwang Pyo Choi, Kyong Hwan Jin
Image warping aims to reshape images defined on rectangular grids into arbitrary shapes. Recently, implicit neural functions have shown remarkable performances in representing images in a continuous manner. However, a standalone multi-layer perceptron suffers from learning high-frequency Fourier coefficients. In this paper, we propose a local texture estimator for image warping (LTEW) followed by an implicit neural representation to deform images into continuous shapes. Local textures estimated from a deep super-resolution (SR) backbone are multiplied by locally-varying Jacobian matrices of a coordinate transformation to predict Fourier responses of a warped image. Our LTEW-based neural function outperforms existing warping methods for asymmetric-scale SR and homography transform. Furthermore, our algorithm well generalizes arbitrary coordinate transformations, such as homography transform with a large magnification factor and equirectangular projection (ERP) perspective transform, which are not provided in training.
CVSep 20, 2023
BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird's Eye View Map ConstructionMinsu Kim, Giseop Kim, Kyong Hwan Jin et al.
A recent sensor fusion in a Bird's Eye View (BEV) space has shown its utility in various tasks such as 3D detection, map segmentation, etc. However, the approach struggles with inaccurate camera BEV estimation, and a perception of distant areas due to the sparsity of LiDAR points. In this paper, we propose a broad BEV fusion (BroadBEV) that addresses the problems with a spatial synchronization approach of cross-modality. Our strategy aims to enhance camera BEV estimation for a broad-sighted perception while simultaneously improving the completion of LiDAR's sparsity in the entire BEV space. Toward that end, we devise Point-scattering that scatters LiDAR BEV distribution to camera depth distribution. The method boosts the learning of depth estimation of the camera branch and induces accurate location of dense camera features in BEV space. For an effective BEV fusion between the spatially synchronized features, we suggest ColFusion that applies self-attention weights of LiDAR and camera BEV features to each other. Our extensive experiments demonstrate that BroadBEV provides a broad-sighted BEV perception with remarkable performance gains.
CVJan 19, 2025Code
BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-ResolutionEunjin Kim, Hyeonjin Kim, Kyong Hwan Jin et al.
While prior methods in Continuous Spatial-Temporal Video Super-Resolution (C-STVSR) employ Implicit Neural Representation (INR) for continuous encoding, they often struggle to capture the complexity of video data, relying on simple coordinate concatenation and pre-trained optical flow networks for motion representation. Interestingly, we find that adding position encoding, contrary to common observations, does not improve--and even degrades--performance. This issue becomes particularly pronounced when combined with pre-trained optical flow networks, which can limit the model's flexibility. To address these issues, we propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video: 1) B-spline Mapper for smooth temporal interpolation, and 2) Fourier Mapper for capturing dominant spatial frequencies. Our approach achieves state-of-the-art in various metrics, including PSNR and SSIM, showing enhanced spatial details and natural temporal consistency. Our code is available https://github.com/Eunjnnn/bfstvsr.
IVApr 3, 2024Code
JDEC: JPEG Decoding via Enhanced Continuous Cosine CoefficientsWoo Kyoung Han, Sunghoon Im, Jaedeok Kim et al. · nvidia
We propose a practical approach to JPEG image decoding, utilizing a local implicit neural representation with continuous cosine formulation. The JPEG algorithm significantly quantizes discrete cosine transform (DCT) spectra to achieve a high compression rate, inevitably resulting in quality degradation while encoding an image. We have designed a continuous cosine spectrum estimator to address the quality degradation issue that restores the distorted spectrum. By leveraging local DCT formulations, our network has the privilege to exploit dequantization and upsampling simultaneously. Our proposed model enables decoding compressed images directly across different quality factors using a single pre-trained model without relying on a conventional JPEG decoder. As a result, our proposed network achieves state-of-the-art performance in flexible color image JPEG artifact removal tasks. Our source code is available at https://github.com/WooKyoungHan/JDEC.
CVAug 13, 2025Code
Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image DiffusionJiwon Kim, Pureum Kim, SeonHwa Kim et al.
Recent advancements in controllable text-to-image (T2I) diffusion models, such as Ctrl-X and FreeControl, have demonstrated robust spatial and appearance control without requiring auxiliary module training. However, these models often struggle to accurately preserve spatial structures and fail to capture fine-grained conditions related to object poses and scene layouts. To address these challenges, we propose a training-free Dual Recursive Feedback (DRF) system that properly reflects control conditions in controllable T2I models. The proposed DRF consists of appearance feedback and generation feedback that recursively refines the intermediate latents to better reflect the given appearance information and the user's intent. This dual-update mechanism guides latent representations toward reliable manifolds, effectively integrating structural and appearance attributes. Our approach enables fine-grained generation even between class-invariant structure-appearance fusion, such as transferring human motion onto a tiger's form. Extensive experiments demonstrate the efficacy of our method in producing high-quality, semantically coherent, and structurally consistent image generations. Our source code is available at https://github.com/jwonkm/DRF.
IVJul 31, 2025Code
JPEG Processing Neural Operator for Backward-Compatible CodingWoo Kyoung Han, Yongjun Lee, Byeonghun Lee et al.
Despite significant advances in learning-based lossy compression algorithms, standardizing codecs remains a critical challenge. In this paper, we present the JPEG Processing Neural Operator (JPNeO), a next-generation JPEG algorithm that maintains full backward compatibility with the current JPEG format. Our JPNeO improves chroma component preservation and enhances reconstruction fidelity compared to existing artifact removal methods by incorporating neural operators in both the encoding and decoding stages. JPNeO achieves practical benefits in terms of reduced memory usage and parameter count. We further validate our hypothesis about the existence of a space with high mutual information through empirical evidence. In summary, the JPNeO functions as a high-performance out-of-the-box image compression pipeline without changing source coding's protocol. Our source code is available at https://github.com/WooKyoungHan/JPNeO.
CVFeb 28, 2025Code
Towards Lossless Implicit Neural Representation via Bit Plane DecompositionWoo Kyoung Han, Byeonghun Lee, Hyunmin Cho et al.
We quantify the upper bound on the size of the implicit neural representation (INR) model from a digital perspective. The upper bound of the model size increases exponentially as the required bit-precision increases. To this end, we present a bit-plane decomposition method that makes INR predict bit-planes, producing the same effect as reducing the upper bound of the model size. We validate our hypothesis that reducing the upper bound leads to faster convergence with constant model size. Our method achieves lossless representation in 2D image and audio fitting, even for high bit-depth signals, such as 16-bit, which was previously unachievable. We pioneered the presence of bit bias, which INR prioritizes as the most significant bit (MSB). We expand the application of the INR task to bit depth expansion, lossless image compression, and extreme network quantization. Our source code is available at https://github.com/WooKyoungHan/LosslessINR
CVSep 4, 2023Code
Implicit Neural Image StitchingMinsu Kim, Jaewon Lee, Byeonghun Lee et al.
Existing frameworks for image stitching often provide visually reasonable stitchings. However, they suffer from blurry artifacts and disparities in illumination, depth level, etc. Although the recent learning-based stitchings relax such disparities, the required methods impose sacrifice of image qualities failing to capture high-frequency details for stitched images. To address the problem, we propose a novel approach, implicit Neural Image Stitching (NIS) that extends arbitrary-scale super-resolution. Our method estimates Fourier coefficients of images for quality-enhancing warps. Then, the suggested model blends color mismatches and misalignment in the latent space and decodes the features into RGB values of stitched images. Our experiments show that our approach achieves improvement in resolving the low-definition imaging of the previous deep image stitching with favorable accelerated image-enhancing methods. Our source code is available at https://github.com/minshu-kim/NIS.
CVSep 4, 2023Code
Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary ConditionMinsu Kim, Yongjun Lee, Woo Kyoung Han et al.
Trendy suggestions for learning-based elastic warps enable the deep image stitchings to align images exposed to large parallax errors. Despite the remarkable alignments, the methods struggle with occasional holes or discontinuity between overlapping and non-overlapping regions of a target image as the applied training strategy mostly focuses on overlap region alignment. As a result, they require additional modules such as seam finder and image inpainting for hiding discontinuity and filling holes, respectively. In this work, we suggest Recurrent Elastic Warps (REwarp) that address the problem with Dirichlet boundary condition and boost performances by residual learning for recurrent misalign correction. Specifically, REwarp predicts a homography and a Thin-plate Spline (TPS) under the boundary constraint for discontinuity and hole-free image stitching. Our experiments show the favorable aligns and the competitive computational costs of REwarp compared to the existing stitching methods. Our source code is available at https://github.com/minshu-kim/REwarp.
CVMar 26, 2024
Self-Rectifying Diffusion Sampling with Perturbed-Attention GuidanceDonghoon Ahn, Hyoungwon Cho, Jaewon Min et al.
Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.
CVDec 5, 2024
A Noise is Worth Diffusion GuidanceDonghoon Ahn, Jiwon Kang, Sanghyun Lee et al.
Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By mapping Gaussian noise to `guidance-free noise', we uncover that small low-magnitude low-frequency components significantly enhance the denoising process, removing the need for guidance and thus improving both inference throughput and memory. Expanding on this, we propose \ours, a novel method that replaces guidance methods with a single refinement of the initial noise. This refined noise enables high-quality image generation without guidance, within the same diffusion pipeline. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs. We validate its effectiveness across diverse metrics and analyze how refined noise can eliminate the need for guidance. See our project page: https://cvlab-kaist.github.io/NoiseRefine/.
IVJul 14, 2025
IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-ResolutionSejin Park, Sangmin Lee, Kyong Hwan Jin et al.
Super-resolution (SR) has been a pivotal task in image processing, aimed at enhancing image resolution across various applications. Recently, look-up table (LUT)-based approaches have attracted interest due to their efficiency and performance. However, these methods are typically designed for fixed scale factors, making them unsuitable for arbitrary-scale image SR (ASISR). Existing ASISR techniques often employ implicit neural representations, which come with considerable computational cost and memory demands. To address these limitations, we propose Interpolation Mixing LUT (IM-LUT), a novel framework that operates ASISR by learning to blend multiple interpolation functions to maximize their representational capacity. Specifically, we introduce IM-Net, a network trained to predict mixing weights for interpolation functions based on local image patterns and the target scale factor. To enhance efficiency of interpolation-based methods, IM-Net is transformed into IM-LUT, where LUTs are employed to replace computationally expensive operations, enabling lightweight and fast inference on CPUs while preserving reconstruction quality. Experimental results on several benchmark datasets demonstrate that IM-LUT consistently achieves a superior balance between image quality and efficiency compared to existing methods, highlighting its potential as a promising solution for resource-constrained applications.
CVFeb 27, 2025
Identity-preserving Distillation Sampling by Fixed-Point IteratorSeonHwa Kim, Jiwon Kim, Soobin Park et al.
Score distillation sampling (SDS) demonstrates a powerful capability for text-conditioned 2D image and 3D object generation by distilling the knowledge from learned score functions. However, SDS often suffers from blurriness caused by noisy gradients. When SDS meets the image editing, such degradations can be reduced by adjusting bias shifts using reference pairs, but the de-biasing techniques are still corrupted by erroneous gradients. To this end, we introduce Identity-preserving Distillation Sampling (IDS), which compensates for the gradient leading to undesired changes in the results. Based on the analysis that these errors come from the text-conditioned scores, a new regularization technique, called fixed-point iterative regularization (FPR), is proposed to modify the score itself, driving the preservation of the identity even including poses and structures. Thanks to a self-correction by FPR, the proposed method provides clear and unambiguous representations corresponding to the given prompts in image-to-image editing and editable neural radiance field (NeRF). The structural consistency between the source and the edited data is obviously maintained compared to other state-of-the-art methods.
CVOct 6, 2025
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion SamplingHyunmin Cho, Donghoon Ahn, Susung Hong et al.
Recent diffusion models achieve the state-of-the-art performance in image generation, but often suffer from semantic inconsistencies or hallucinations. While various inference-time guidance methods can enhance generation, they often operate indirectly by relying on external signals or architectural modifications, which introduces additional computational overhead. In this paper, we propose Tangential Amplifying Guidance (TAG), a more efficient and direct guidance method that operates solely on trajectory signals without modifying the underlying diffusion model. TAG leverages an intermediate sample as a projection basis and amplifies the tangential components of the estimated scores with respect to this basis to correct the sampling trajectory. We formalize this guidance process by leveraging a first-order Taylor expansion, which demonstrates that amplifying the tangential component steers the state toward higher-probability regions, thereby reducing inconsistencies and enhancing sample quality. TAG is a plug-and-play, architecture-agnostic module that improves diffusion sampling fidelity with minimal computational addition, offering a new perspective on diffusion guidance.
CVNov 17, 2021
Local Texture Estimator for Implicit Representation FunctionJaewon Lee, Kyong Hwan Jin
Recent works with an implicit neural function shed light on representing images in arbitrary resolution. However, a standalone multi-layer perceptron shows limited performance in learning high-frequency components. In this paper, we propose a Local Texture Estimator (LTE), a dominant-frequency estimator for natural images, enabling an implicit function to capture fine details while reconstructing images in a continuous manner. When jointly trained with a deep super-resolution (SR) architecture, LTE is capable of characterizing image textures in 2D Fourier space. We show that an LTE-based neural function achieves favorable performance against existing deep SR methods within an arbitrary-scale factor. Furthermore, we demonstrate that our implementation takes the shortest running time compared to previous works.
CVNov 25, 2019
Reducing the Human Effort in Developing PET-CT RegistrationTeaghan O'Briain, Kyong Hwan Jin, Hongyoon Choi et al.
We aim to reduce the tedious nature of developing and evaluating methods for aligning PET-CT scans from multiple patient visits. Current methods for registration rely on correspondences that are created manually by medical experts with 3D manipulation, or assisted alignments done by utilizing mutual information across CT scans that may not be consistent when transferred to the PET images. Instead, we propose to label multiple key points across several 2D slices, which we then fit a key curve to. This removes the need for creating manual alignments in 3D and makes the labelling process easier. We use these key curves to define an error metric for the alignments that can be computed efficiently. While our metric is non-differentiable, we further show that we can utilize it during the training of our deep model via a novel method. Specifically, instead of relying on detailed geometric labels -- e.g., manual 3D alignments -- we use synthetically generated deformations of real data. To incorporate robustness to changes that occur between visits other than geometric changes, we enforce consistency across visits in the deep network's internal representations. We demonstrate the potential of our method via qualitative and quantitative experiments.
IVOct 3, 2019
Time-Dependent Deep Image Prior for Dynamic MRIJaejun Yoo, Kyong Hwan Jin, Harshit Gupta et al.
We propose a novel unsupervised deep-learning-based algorithm for dynamic magnetic resonance imaging (MRI) reconstruction. Dynamic MRI requires rapid data acquisition for the study of moving organs such as the heart. Existing reconstruction methods suffer from restrictions either in the model design or in the absence of ground-truth data, resulting in low image quality. We introduce a generalized version of the deep-image-prior approach, which optimizes the network weights to fit a sequence of sparsely acquired dynamic MRI measurements. Our method needs neither prior training nor additional data. In particular, for cardiac images, it does not require the marking of heartbeats or the reordering of spokes. The key ingredients of our method are threefold: 1) a fixed low-dimensional manifold that encodes the temporal variations of images; 2) a network that maps the manifold into a more expressive latent space; and 3) a convolutional neural network that generates a dynamic series of MRI images from the latent variables and that favors their consistency with the measurements in k-space. Our method outperforms the state-of-the-art methods quantitatively and qualitatively in both retrospective and real fetal cardiac datasets. To the best of our knowledge, this is the first unsupervised deep-learning-based method that can reconstruct the continuous variation of dynamic MRI sequences with high spatial resolution.
CVJan 14, 2019
Self-Supervised Deep Active Accelerated MRIKyong Hwan Jin, Michael Unser, Kwang Moo Yi
We propose to simultaneously learn to sample and reconstruct magnetic resonance images (MRI) to maximize the reconstruction quality given a limited sample budget, in a self-supervised setup. Unlike existing deep methods that focus only on reconstructing given data, thus being passive, we go beyond the current state of the art by considering both the data acquisition and the reconstruction process within a single deep-learning framework. As our network learns to acquire data, the network is active in nature. In order to do so, we simultaneously train two neural networks, one dedicated to reconstruction and the other to progressive sampling, each with an automatically generated supervision signal that links them together. The two supervision signals are created through Monte Carlo tree search (MCTS). MCTS returns a better sampling pattern than what the current sampling network can give and, thus, a better final reconstruction. The sampling network is trained to mimic the MCTS results using the previous sampling network, thus being enhanced. The reconstruction network is trained to give the highest reconstruction quality, given the MCTS sampling pattern. Through this framework, we are able to train the two networks without providing any direct supervision on sampling.
IVOct 11, 2017
A Review of Convolutional Neural Networks for Inverse Problems in ImagingMichael T. McCann, Kyong Hwan Jin, Michael Unser
In this survey paper, we review recent uses of convolution neural networks (CNNs) to solve inverse problems in imaging. It has recently become feasible to train deep CNNs on large databases of images, and they have shown outstanding performance on object classification and segmentation tasks. Motivated by these successes, researchers have begun to apply CNNs to the resolution of inverse problems such as denoising, deconvolution, super-resolution, and medical image reconstruction, and they have started to report improvements over state-of-the-art methods, including sparsity-based techniques such as compressed sensing. Here, we review the recent experimental work in these areas, with a focus on the critical design decisions: Where does the training data come from? What is the architecture of the CNN? and How is the learning problem formulated and solved? We also bring together a few key theoretical papers that offer perspective on why CNNs are appropriate for inverse problems and point to some next steps in the field.
CVSep 6, 2017
CNN-Based Projected Gradient Descent for Consistent Image ReconstructionHarshit Gupta, Kyong Hwan Jin, Ha Q. Nguyen et al.
We present a new method for image reconstruction which replaces the projector in a projected gradient descent (PGD) with a convolutional neural network (CNN). CNNs trained as high-dimensional (image-to-image) regressors have recently been used to efficiently solve inverse problems in imaging. However, these approaches lack a feedback mechanism to enforce that the reconstructed image is consistent with the measurements. This is crucial for inverse problems, and more so in biomedical imaging, where the reconstructions are used for diagnosis. In our scheme, the gradient descent enforces measurement consistency, while the CNN recursively projects the solution closer to the space of desired reconstruction images. We provide a formal framework to ensure that the classical PGD converges to a local minimizer of a non-convex constrained least-squares problem. When the projector is replaced with a CNN, we propose a relaxed PGD, which always converges. Finally, we propose a simple scheme to train a CNN to act like a projector. Our experiments on sparse view Computed Tomography (CT) reconstruction for both noiseless and noisy measurements show an improvement over the total-variation (TV) method and a recent CNN-based technique.
CVApr 20, 2017
Predicting Cognitive Decline with Deep Learning of Brain Metabolism and Amyloid ImagingHongyoon Choi, Kyong Hwan Jin
For effective treatment of Alzheimer disease (AD), it is important to identify subjects who are most likely to exhibit rapid cognitive decline. Herein, we developed a novel framework based on a deep convolutional neural network which can predict future cognitive decline in mild cognitive impairment (MCI) patients using flurodeoxyglucose and florbetapir positron emission tomography (PET). The architecture of the network only relies on baseline PET studies of AD and normal subjects as the training dataset. Feature extraction and complicated image preprocessing including nonlinear warping are unnecessary for our approach. Accuracy of prediction (84.2%) for conversion to AD in MCI patients outperformed conventional feature-based quantification approaches. ROC analyses revealed that performance of CNN-based approach was significantly higher than that of the conventional quantification methods (p < 0.05). Output scores of the network were strongly correlated with the longitudinal change in cognitive measurements. These results show the feasibility of deep learning as a tool for predicting disease outcome using brain images.
CVNov 11, 2016
Deep Convolutional Neural Network for Inverse Problems in ImagingKyong Hwan Jin, Michael T. McCann, Emmanuel Froustey et al.
In this paper, we propose a novel deep convolutional neural network (CNN)-based algorithm for solving ill-posed inverse problems. Regularized iterative algorithms have emerged as the standard approach to ill-posed inverse problems in the past few decades. These methods produce excellent results, but can be challenging to deploy in practice due to factors including the high computational cost of the forward and adjoint operators and the difficulty of hyper parameter selection. The starting point of our work is the observation that unrolled iterative methods have the form of a CNN (filtering followed by point-wise non-linearity) when the normal operator (H*H, the adjoint of H times H) of the forward model is a convolution. Based on this observation, we propose using direct inversion followed by a CNN to solve normal-convolutional inverse problems. The direct inversion encapsulates the physical model of the system, but leads to artifacts when the problem is ill-posed; the CNN combines multiresolution decomposition and residual learning in order to learn to remove these artifacts while preserving image structure. We demonstrate the performance of the proposed network in sparse-view reconstruction (down to 50 views) on parallel beam X-ray computed tomography in synthetic phantoms as well as in real experimental sinograms. The proposed network outperforms total variation-regularized iterative reconstruction for the more realistic phantoms and requires less than a second to reconstruct a 512 x 512 image on GPU.
CVOct 19, 2015
Sparse + Low Rank Decomposition of Annihilating Filter-based Hankel Matrix for Impulse Noise RemovalKyong Hwan Jin, Jong Chul Ye
Recently, so called annihilating filer-based low rank Hankel matrix (ALOHA) approach was proposed as a powerful image inpainting method. Based on the observation that smoothness or textures within an image patch corresponds to sparse spectral components in the frequency domain, ALOHA exploits the existence of annihilating filters and the associated rank-deficient Hankel matrices in the image domain to estimate the missing pixels. By extending this idea, here we propose a novel impulse noise removal algorithm using sparse + low rank decomposition of an annihilating filter-based Hankel matrix. The new approach, what we call the robust ALOHA, is motivated by the observation that an image corrupted with impulse noises has intact pixels; so the impulse noises can be modeled as sparse components, whereas the underlying image can be still modeled using a low-rank Hankel structured matrix. To solve the sparse + low rank decomposition problem, we propose an alternating direction method of multiplier (ADMM) method with initial factorized matrices coming from low rank matrix fitting (LMaFit) algorithm. To adapt the local image statistics that have distinct spectral distributions, the robust ALOHA is applied patch by patch. Experimental results from two types of impulse noises - random valued impulse noises and salt/pepper noises - for both single channel and multi-channel color images demonstrate that the robust ALOHA outperforms the existing algorithms up to 8dB in terms of the peak signal to noise ratio (PSNR).