CVNov 8, 2025Code
Light-Field Dataset for Disparity Based Depth EstimationSuresh Nehra, Aupendu Kar, Jayanta Mukhopadhyay et al.
A Light Field (LF) camera consists of an additional two-dimensional array of micro-lenses placed between the main lens and sensor, compared to a conventional camera. The sensor pixels under each micro-lens receive light from a sub-aperture of the main lens. This enables the image sensor to capture both spatial information and the angular resolution of a scene point. This additional angular information is used to estimate the depth of a 3-D scene. The continuum of virtual viewpoints in light field data enables efficient depth estimation using Epipolar Line Images (EPIs) with robust occlusion handling. However, the trade-off between angular information and spatial information is very critical and depends on the focal position of the camera. To design, develop, implement, and test novel disparity-based light field depth estimation algorithms, the availability of suitable light field image datasets is essential. In this paper, a publicly available light field image dataset is introduced and thoroughly described. We have also demonstrated the effect of focal position on the disparity of a 3-D point as well as the shortcomings of the currently available light field dataset. The proposed dataset contains 285 light field images captured using a Lytro Illum LF camera and 13 synthetic LF images. The proposed dataset also comprises a synthetic dataset with similar disparity characteristics to those of a real light field camera. A real and synthetic stereo light field dataset is also created by using a mechanical gantry system and Blender. The dataset is available at https://github.com/aupendu/light-field-dataset.
CVNov 7, 2025Code
Sharing the Learned Knowledge-base to Estimate Convolutional Filter Parameters for Continual Image RestorationAupendu Kar, Krishnendu Ghosh, Prabir Kumar Biswas
Continual learning is an emerging topic in the field of deep learning, where a model is expected to learn continuously for new upcoming tasks without forgetting previous experiences. This field has witnessed numerous advancements, but few works have been attempted in the direction of image restoration. Handling large image sizes and the divergent nature of various degradation poses a unique challenge in the restoration domain. However, existing works require heavily engineered architectural modifications for new task adaptation, resulting in significant computational overhead. Regularization-based methods are unsuitable for restoration, as different restoration challenges require different kinds of feature processing. In this direction, we propose a simple modification of the convolution layer to adapt the knowledge from previous restoration tasks without touching the main backbone architecture. Therefore, it can be seamlessly applied to any deep architecture without any structural modifications. Unlike other approaches, we demonstrate that our model can increase the number of trainable parameters without significantly increasing computational overhead or inference time. Experimental validation demonstrates that new restoration tasks can be introduced without compromising the performance of existing tasks. We also show that performance on new restoration tasks improves by adapting the knowledge from the knowledge base created by previous restoration tasks. The code is available at https://github.com/aupendu/continual-restore.
CVJul 25, 2022
Sub-Aperture Feature Adaptation in Single Image Super-resolution Model for Light Field ImagingAupendu Kar, Suresh Nehra, Jayanta Mukhopadhyay et al.
With the availability of commercial Light Field (LF) cameras, LF imaging has emerged as an up and coming technology in computational photography. However, the spatial resolution is significantly constrained in commercial microlens based LF cameras because of the inherent multiplexing of spatial and angular information. Therefore, it becomes the main bottleneck for other applications of light field cameras. This paper proposes an adaptation module in a pretrained Single Image Super Resolution (SISR) network to leverage the powerful SISR model instead of using highly engineered light field imaging domain specific Super Resolution models. The adaption module consists of a Sub aperture Shift block and a fusion block. It is an adaptation in the SISR network to further exploit the spatial and angular information in LF images to improve the super resolution performance. Experimental validation shows that the proposed method outperforms existing light field super resolution algorithms. It also achieves PSNR gains of more than 1 dB across all the datasets as compared to the same pretrained SISR models for scale factor 2, and PSNR gains 0.6 to 1 dB for scale factor 4.
IVMar 1, 2025Code
Artificially Generated Visual Scanpath Improves Multi-label Thoracic Disease Classification in Chest X-Ray ImagesAshish Verma, Aupendu Kar, Krishnendu Ghosh et al.
Expert radiologists visually scan Chest X-Ray (CXR) images, sequentially fixating on anatomical structures to perform disease diagnosis. An automatic multi-label classifier of diseases in CXR images can benefit by incorporating aspects of the radiologists' approach. Recorded visual scanpaths of radiologists on CXR images can be used for the said purpose. But, such scanpaths are not available for most CXR images, which creates a gap even for modern deep learning based classifiers. This paper proposes to mitigate this gap by generating effective artificial visual scanpaths using a visual scanpath prediction model for CXR images. Further, a multi-class multi-label classifier framework is proposed that uses a generated scanpath and visual image features to classify diseases in CXR images. While the scanpath predictor is based on a recurrent neural network, the multi-label classifier involves a novel iterative sequential model with an attention module. We show that our scanpath predictor generates human-like visual scanpaths. We also demonstrate that the use of artificial visual scanpaths improves multi-class multi-label disease classification results on CXR images. The above observations are made from experiments involving around 0.2 million CXR images from 2 widely-used datasets considering the multi-label classification of 14 pathological findings. Code link: https://github.com/ashishverma03/SDC
CVMar 1, 2025
Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image EnhancementAupendu Kar, Sobhan K. Dhara, Debashis Sen et al.
Real-world low-light images captured by imaging devices suffer from poor visibility and require a domain-specific enhancement to produce artifact-free outputs that reveal details. In this paper, we propose an unpaired low-light image enhancement network leveraging novel controlled transformation-based self-supervision and unpaired self-conditioning strategies. The model determines the required degrees of enhancement at the input image pixels, which are learned from the unpaired low-lit and well-lit images without any direct supervision. The self-supervision is based on a controlled transformation of the input image and subsequent maintenance of its enhancement in spite of the transformation. The self-conditioning performs training of the model on unpaired images such that it does not enhance an already-enhanced image or a well-lit input image. The inherent noise in the input low-light images is handled by employing low gradient magnitude suppression in a detail-preserving manner. In addition, our noise handling is self-conditioned by preventing the denoising of noise-free well-lit images. The training based on low-light image enhancement-specific attributes allows our model to avoid paired supervision without compromising significantly in performance. While our proposed self-supervision aids consistent enhancement, our novel self-conditioning facilitates adequate enhancement. Extensive experiments on multiple standard datasets demonstrate that our model, in general, outperforms the state-of-the-art both quantitatively and subjectively. Ablation studies show the effectiveness of our self-supervision and self-conditioning strategies, and the related loss functions.
CVSep 12, 2025
Event Camera Guided Visual Media Restoration & 3D Reconstruction: A SurveyAupendu Kar, Vishnu Raj, Guan-Ming Su
Event camera sensors are bio-inspired sensors which asynchronously capture per-pixel brightness changes and output a stream of events encoding the polarity, location and time of these changes. These systems are witnessing rapid advancements as an emerging field, driven by their low latency, reduced power consumption, and ultra-high capture rates. This survey explores the evolution of fusing event-stream captured with traditional frame-based capture, highlighting how this synergy significantly benefits various video restoration and 3D reconstruction tasks. The paper systematically reviews major deep learning contributions to image/video enhancement and restoration, focusing on two dimensions: temporal enhancement (such as frame interpolation and motion deblurring) and spatial enhancement (including super-resolution, low-light and HDR enhancement, and artifact reduction). This paper also explores how the 3D reconstruction domain evolves with the advancement of event driven fusion. Diverse topics are covered, with in-depth discussions on recent works for improving visual quality under challenging conditions. Additionally, the survey compiles a comprehensive list of openly available datasets, enabling reproducible research and benchmarking. By consolidating recent progress and insights, this survey aims to inspire further research into leveraging event camera systems, especially in combination with deep learning, for advanced visual media restoration and enhancement.
CVAug 4, 2020
Progressive Update Guided Interdependent Networks for Single Image DehazingAupendu Kar, Sobhan Kanti Dhara, Debashis Sen et al.
Images with haze of different varieties often pose a significant challenge to dehazing. Therefore, guidance by estimates of haze parameters related to the variety would be beneficial, and their progressive update jointly with haze reduction will allow effective dehazing. To this end, we propose a multi-network dehazing framework containing novel interdependent dehazing and haze parameter updater networks that operate in a progressive manner. The haze parameters, transmission map and atmospheric light, are first estimated using dedicated convolutional networks that allow color-cast handling. The estimated parameters are then used to guide our dehazing module, where the estimates are progressively updated by novel convolutional networks. The updating takes place jointly with progressive dehazing using a network that invokes inter-step dependencies. The joint progressive updating and dehazing gradually modify the haze parameter values toward achieving effective dehazing. Through different studies, our dehazing framework is shown to be more effective than image-to-image mapping and predefined haze formation model based dehazing. The framework is also found capable of handling a wide variety of hazy conditions wtih different types and amounts of haze and color casts. Our dehazing framework is qualitatively and quantitatively found to outperform the state-of-the-art on synthetic and real-world hazy images of multiple datasets with varied haze conditions.
CVMar 22, 2019
Fast Bayesian Uncertainty Estimation and Reduction of Batch Normalized Single Image Super-Resolution NetworkAupendu Kar, Prabir Kumar Biswas
Convolutional neural network (CNN) has achieved unprecedented success in image super-resolution tasks in recent years. However, the network's performance depends on the distribution of the training sets and degrades on out-of-distribution samples. This paper adopts a Bayesian approach for estimating uncertainty associated with output and applies it in a deep image super-resolution model to address the concern mentioned above. We use the uncertainty estimation technique using the batch-normalization layer, where stochasticity of the batch mean and variance generate Monte-Carlo (MC) samples. The MC samples, which are nothing but different super-resolved images using different stochastic parameters, reconstruct the image, and provide a confidence or uncertainty map of the reconstruction. We propose a faster approach for MC sample generation, and it allows the variable image size during testing. Therefore, it will be useful for image reconstruction domain. Our experimental findings show that this uncertainty map strongly relates to the quality of reconstruction generated by the deep CNN model and explains its limitation. Furthermore, this paper proposes an approach to reduce the model's uncertainty for an input image, and it helps to defend the adversarial attacks on the image super-resolution model. The proposed uncertainty reduction technique also improves the performance of the model for out-of-distribution test images. To the best of our knowledge, we are the first to propose an adversarial defense mechanism in any image reconstruction domain.
CVJan 17, 2019
UltraCompression: Framework for High Density Compression of Ultrasound Volumes using Physics Modeling Deep Neural NetworksDebarghya China, Francis Tom, Sumanth Nandamuri et al.
Ultrasound image compression by preserving speckle-based key information is a challenging task. In this paper, we introduce an ultrasound image compression framework with the ability to retain realism of speckle appearance despite achieving very high-density compression factors. The compressor employs a tissue segmentation method, transmitting segments along with transducer frequency, number of samples and image size as essential information required for decompression. The decompressor is based on a convolutional network trained to generate patho-realistic ultrasound images which convey essential information pertinent to tissue pathology visible in the images. We demonstrate generalizability of the building blocks using two variants to build the compressor. We have evaluated the quality of decompressed images using distortion losses as well as perception loss and compared it with other off the shelf solutions. The proposed method achieves a compression ratio of $725:1$ while preserving the statistical distribution of speckles. This enables image segmentation on decompressed images to achieve dice score of $0.89 \pm 0.11$, which evidently is not so accurately achievable when images are compressed with current standards like JPEG, JPEG 2000, WebP and BPG. We envision this frame work to serve as a roadmap for speckle image compression standards.
CVMay 17, 2018
Fully Convolutional Model for Variable Bit Length and Lossy High Density Compression of MammogramsAupendu Kar, Sri Phani Krishna Karri, Nirmalya Ghosh et al.
Early works on medical image compression date to the 1980's with the impetus on deployment of teleradiology systems for high-resolution digital X-ray detectors. Commercially deployed systems during the period could compress 4,096 x 4,096 sized images at 12 bpp to 2 bpp using lossless arithmetic coding, and over the years JPEG and JPEG2000 were imbibed reaching upto 0.1 bpp. Inspired by the reprise of deep learning based compression for natural images over the last two years, we propose a fully convolutional autoencoder for diagnostically relevant feature preserving lossy compression. This is followed by leveraging arithmetic coding for encapsulating high redundancy of features for further high-density code packing leading to variable bit length. We demonstrate performance on two different publicly available digital mammography datasets using peak signal-to-noise ratio (pSNR), structural similarity (SSIM) index and domain adaptability tests between datasets. At high density compression factors of >300x (~0.04 bpp), our approach rivals JPEG and JPEG2000 as evaluated through a Radiologist's visual Turing test.