Nazmul Karim

CV
h-index9
17papers
409citations
Novelty53%
AI Score33

17 Papers

CVMar 28, 2022
UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning

Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard et al.

Supervised deep learning methods require a large repository of annotated data; hence, label noise is inevitable. Training with such noisy data negatively impacts the generalization performance of deep neural networks. To combat label noise, recent state-of-the-art methods employ some sort of sample selection mechanism to select a possibly clean subset of data. Next, an off-the-shelf semi-supervised learning method is used for training where rejected samples are treated as unlabeled data. Our comprehensive analysis shows that current selection methods disproportionately select samples from easy (fast learnable) classes while rejecting those from relatively harder ones. This creates class imbalance in the selected clean set and in turn, deteriorates performance under high label noise. In this work, we propose UNICON, a simple yet effective sample selection method which is robust to high label noise. To address the disproportionate selection of easy and hard samples, we introduce a Jensen-Shannon divergence based uniform selection mechanism which does not require any probabilistic modeling and hyperparameter tuning. We complement our selection method with contrastive learning to further combat the memorization of noisy labels. Extensive experimentation on multiple benchmark datasets demonstrates the effectiveness of UNICON; we obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate. Our code is publicly available

CVMar 30, 2023
C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation

Nazmul Karim, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi et al.

Unsupervised domain adaptation (UDA) approaches focus on adapting models trained on a labeled source domain to an unlabeled target domain. UDA methods have a strong assumption that the source data is accessible during adaptation, which may not be feasible in many real-world scenarios due to privacy concerns and resource constraints of devices. In this regard, source-free domain adaptation (SFDA) excels as access to source data is no longer required during adaptation. Recent state-of-the-art (SOTA) methods on SFDA mostly focus on pseudo-label refinement based self-training which generally suffers from two issues: i) inevitable occurrence of noisy pseudo-labels that could lead to early training time memorization, ii) refinement process requires maintaining a memory bank which creates a significant burden in resource constraint scenarios. To address these concerns, we propose C-SFDA, a curriculum learning aided self-training framework for SFDA that adapts efficiently and reliably to changes across domains based on selective pseudo-labeling. Specifically, we employ a curriculum learning scheme to promote learning from a restricted amount of pseudo labels selected based on their reliabilities. This simple yet effective step successfully prevents label noise propagation during different stages of adaptation and eliminates the need for costly memory-bank based label refinement. Our extensive experimental evaluations on both image recognition and semantic segmentation tasks confirm the effectiveness of our method. C-SFDA is readily applicable to online test-time domain adaptation and also outperforms previous SOTA methods in this task.

CVApr 6, 2022
RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

Umar Khalid, Ashkan Esmaeili, Nazmul Karim et al.

Recent studies have addressed the concern of detecting and rejecting the out-of-distribution (OOD) samples as a major challenge in the safe deployment of deep learning (DL) models. It is desired that the DL model should only be confident about the in-distribution (ID) data which reinforces the driving principle of the OOD detection. In this paper, we propose a simple yet effective generalized OOD detection method independent of out-of-distribution datasets. Our approach relies on self-supervised feature learning of the training samples, where the embeddings lie on a compact low-dimensional space. Motivated by the recent studies that show self-supervised adversarial contrastive learning helps robustify the model, we empirically show that a pre-trained model with self-supervised contrastive learning yields a better model for uni-dimensional feature learning in the latent space. The method proposed in this work referred to as RODD outperforms SOTA detection performance on an extensive suite of benchmark datasets on OOD detection tasks. On the CIFAR-100 benchmarks, RODD achieves a 26.97 $\%$ lower false-positive rate (FPR@95) compared to SOTA methods.

CVApr 21, 2022
CNLL: A Semi-supervised Approach For Continual Noisy Label Learning

Nazmul Karim, Umar Khalid, Ashkan Esmaeili et al.

The task of continual learning requires careful design of algorithms that can tackle catastrophic forgetting. However, the noisy label, which is inevitable in a real-world scenario, seems to exacerbate the situation. While very few studies have addressed the issue of continual learning under noisy labels, long training time and complicated training schemes limit their applications in most cases. In contrast, we propose a simple purification technique to effectively cleanse the online data stream that is both cost-effective and more accurate. After purification, we perform fine-tuning in a semi-supervised fashion that ensures the participation of all available samples. Training in this fashion helps us learn a better representation that results in state-of-the-art (SOTA) performance. Through extensive experimentation on 3 benchmark datasets, MNIST, CIFAR10 and CIFAR100, we show the effectiveness of our proposed approach. We achieve a 24.8% performance gain for CIFAR10 with 20% noise over previous SOTA methods. Our code is publicly available.

SPApr 6, 2022
RF Signal Transformation and Classification using Deep Neural Networks

Umar Khalid, Nazmul Karim, Nazanin Rahnavard

Deep neural networks (DNNs) designed for computer vision and natural language processing tasks cannot be directly applied to the radio frequency (RF) datasets. To address this challenge, we propose to convert the raw RF data to data types that are suitable for off-the-shelf DNNs by introducing a convolutional transform technique. In addition, we propose a simple 5-layer convolutional neural network architecture (CONV-5) that can operate with raw RF I/Q data without any transformation. Further, we put forward an RF dataset, referred to as RF1024, to facilitate future RF research. RF1024 consists of 8 different RF modulation classes with each class having 1000/200 training/test samples. Each sample of the RF1024 dataset contains 1024 complex I/Q values. Lastly, the experiments are performed on the RadioML2016 and RF1024 datasets to demonstrate the improved classification performance.

CVSep 1, 2024
Fisher Information guided Purification against Backdoor Attacks

Nazmul Karim, Abdullah Al Arafat, Adnan Siraj Rakin et al.

Studies on backdoor attacks in recent years suggest that an adversary can compromise the integrity of a deep neural network (DNN) by manipulating a small set of training samples. Our analysis shows that such manipulation can make the backdoor model converge to a bad local minima, i.e., sharper minima as compared to a benign model. Intuitively, the backdoor can be purified by re-optimizing the model to smoother minima. However, a naïve adoption of any optimization targeting smoother minima can lead to sub-optimal purification techniques hampering the clean test accuracy. Hence, to effectively obtain such re-optimization, inspired by our novel perspective establishing the connection between backdoor removal and loss smoothness, we propose Fisher Information guided Purification (FIP), a novel backdoor purification framework. Proposed FIP consists of a couple of novel regularizers that aid the model in suppressing the backdoor effects and retaining the acquired knowledge of clean data distribution throughout the backdoor removal procedure through exploiting the knowledge of Fisher Information Matrix (FIM). In addition, we introduce an efficient variant of FIP, dubbed as Fast FIP, which reduces the number of tunable parameters significantly and obtains an impressive runtime gain of almost $5\times$. Extensive experiments show that the proposed method achieves state-of-the-art (SOTA) performance on a wide range of backdoor defense benchmarks: 5 different tasks -- Image Recognition, Object Detection, Video Action Recognition, 3D point Cloud, Language Generation; 11 different datasets including ImageNet, PASCAL VOC, UCF101; diverse model architectures spanning both CNN and vision transformer; 14 different backdoor attacks, e.g., Dynamic, WaNet, LIRA, ISSBA, etc.

CVJun 30, 2023
Efficient Backdoor Removal Through Natural Gradient Fine-tuning

Nazmul Karim, Abdullah Al Arafat, Umar Khalid et al.

The success of a deep neural network (DNN) heavily relies on the details of the training scheme; e.g., training data, architectures, hyper-parameters, etc. Recent backdoor attacks suggest that an adversary can take advantage of such training details and compromise the integrity of a DNN. Our studies show that a backdoor model is usually optimized to a bad local minima, i.e. sharper minima as compared to a benign model. Intuitively, a backdoor model can be purified by reoptimizing the model to a smoother minima through fine-tuning with a few clean validation data. However, fine-tuning all DNN parameters often requires huge computational costs and often results in sub-par clean test performance. To address this concern, we propose a novel backdoor purification technique, Natural Gradient Fine-tuning (NGF), which focuses on removing the backdoor by fine-tuning only one layer. Specifically, NGF utilizes a loss surface geometry-aware optimizer that can successfully overcome the challenge of reaching a smooth minima under a one-layer optimization scenario. To enhance the generalization performance of our proposed method, we introduce a clean data distribution-aware regularizer based on the knowledge of loss surface curvature matrix, i.e., Fisher Information Matrix. Extensive experiments show that the proposed method achieves state-of-the-art performance on a wide range of backdoor defense benchmarks: four different datasets- CIFAR10, GTSRB, Tiny-ImageNet, and ImageNet; 13 recent backdoor attacks, e.g. Blend, Dynamic, WaNet, ISSBA, etc.

CVJul 14, 2024
Augmented Neural Fine-Tuning for Efficient Backdoor Purification

Nazmul Karim, Abdullah Al Arafat, Umar Khalid et al.

Recent studies have revealed the vulnerability of deep neural networks (DNNs) to various backdoor attacks, where the behavior of DNNs can be compromised by utilizing certain types of triggers or poisoning mechanisms. State-of-the-art (SOTA) defenses employ too-sophisticated mechanisms that require either a computationally expensive adversarial search module for reverse-engineering the trigger distribution or an over-sensitive hyper-parameter selection module. Moreover, they offer sub-par performance in challenging scenarios, e.g., limited validation data and strong attacks. In this paper, we propose Neural mask Fine-Tuning (NFT) with an aim to optimally re-organize the neuron activities in a way that the effect of the backdoor is removed. Utilizing a simple data augmentation like MixUp, NFT relaxes the trigger synthesis process and eliminates the requirement of the adversarial search module. Our study further reveals that direct weight fine-tuning under limited validation data results in poor post-purification clean test accuracy, primarily due to overfitting issue. To overcome this, we propose to fine-tune neural masks instead of model weights. In addition, a mask regularizer has been devised to further mitigate the model drift during the purification process. The distinct characteristics of NFT render it highly efficient in both runtime and sample usage, as it can remove the backdoor even when a single sample is available from each class. We validate the effectiveness of NFT through extensive experiments covering the tasks of image classification, object detection, video action recognition, 3D point cloud, and natural language processing. We evaluate our method against 14 different attacks (LIRA, WaNet, etc.) on 11 benchmark data sets such as ImageNet, UCF101, Pascal VOC, ModelNet, OpenSubtitles2012, etc.

IRAug 15, 2022
A Survey of Recommender System Techniques and the Ecommerce Domain

Imran Hossain, Md Aminul Haque Palash, Anika Tabassum Sejuty et al.

In this big data era, it is hard for the current generation to find the right data from the huge amount of data contained within online platforms. In such a situation, there is a need for an information filtering system that might help them find the information they are looking for. In recent years, a research field has emerged known as recommender systems. Recommenders have become important as they have many real-life applications. This paper reviews the different techniques and developments of recommender systems in e-commerce, e-tourism, e-resources, e-government, e-learning, and e-library. By analyzing recent work on this topic, we will be able to provide a detailed overview of current developments and identify existing difficulties in recommendation systems. The final results give practitioners and researchers the necessary guidance and insights into the recommendation system and its application.

CVDec 14, 2023
LatentEditor: Text Driven Local Editing of 3D Scenes

Umar Khalid, Hasan Iqbal, Nazmul Karim et al.

While neural fields have made significant strides in view synthesis and scene reconstruction, editing them poses a formidable challenge due to their implicit encoding of geometry and texture information from multi-view inputs. In this paper, we introduce \textsc{LatentEditor}, an innovative framework designed to empower users with the ability to perform precise and locally controlled editing of neural fields using text prompts. Leveraging denoising diffusion models, we successfully embed real-world scenes into the latent space, resulting in a faster and more adaptable NeRF backbone for editing compared to traditional methods. To enhance editing precision, we introduce a delta score to calculate the 2D mask in the latent space that serves as a guide for local modifications while preserving irrelevant regions. Our novel pixel-level scoring approach harnesses the power of InstructPix2Pix (IP2P) to discern the disparity between IP2P conditional and unconditional noise predictions in the latent space. The edited latents conditioned on the 2D masks are then iteratively updated in the training set to achieve 3D local editing. Our approach achieves faster editing speeds and superior output quality compared to existing 3D editing models, bridging the gap between textual instructions and high-quality 3D scene editing in latent space. We show the superiority of our approach on four benchmark 3D datasets, LLFF, IN2N, NeRFStudio and NeRF-Art. Project Page: https://latenteditor.github.io/

CVDec 21, 2023
Free-Editor: Zero-shot Text-driven 3D Scene Editing

Nazmul Karim, Hasan Iqbal, Umar Khalid et al.

Text-to-Image (T2I) diffusion models have recently gained traction for their versatility and user-friendliness in 2D content generation and editing. However, training a diffusion model specifically for 3D scene editing is challenging due to the scarcity of large-scale datasets. Currently, editing 3D scenes necessitates either retraining the model to accommodate various 3D edits or developing specific methods tailored to each unique editing type. Moreover, state-of-the-art (SOTA) techniques require multiple synchronized edited images from the same scene to enable effective scene editing. Given the current limitations of T2I models, achieving consistent editing effects across multiple images remains difficult, leading to multi-view inconsistency in editing. This inconsistency undermines the performance of 3D scene editing when these images are utilized. In this study, we introduce a novel, training-free 3D scene editing technique called \textsc{Free-Editor}, which enables users to edit 3D scenes without the need for model retraining during the testing phase. Our method effectively addresses the issue of multi-view style inconsistency found in state-of-the-art (SOTA) methods through the implementation of a single-view editing scheme. Specifically, we demonstrate that editing a particular 3D scene can be achieved by modifying only a single view. To facilitate this, we present an Edit Transformer that ensures intra-view consistency and inter-view style transfer using self-view and cross-view attention mechanisms, respectively. By eliminating the need for model retraining and multi-view editing, our approach significantly reduces editing time and memory resource requirements, achieving runtimes approximately 20 times faster than SOTA methods. We have performed extensive experiments on various benchmark datasets, showcasing the diverse editing capabilities of our proposed technique.

CVMar 14, 2025
PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing

Hasan Iqbal, Nazmul Karim, Umar Khalid et al.

Instruction-guided generative models, especially those using text-to-image (T2I) and text-to-video (T2V) diffusion frameworks, have advanced the field of content editing in recent years. To extend these capabilities to 4D scene, we introduce a progressive sampling framework for 4D editing (PSF-4D) that ensures temporal and multi-view consistency by intuitively controlling the noise initialization during forward diffusion. For temporal coherence, we design a correlated Gaussian noise structure that links frames over time, allowing each frame to depend meaningfully on prior frames. Additionally, to ensure spatial consistency across views, we implement a cross-view noise model, which uses shared and independent noise components to balance commonalities and distinct details among different views. To further enhance spatial coherence, PSF-4D incorporates view-consistent iterative refinement, embedding view-aware information into the denoising process to ensure aligned edits across frames and views. Our approach enables high-quality 4D editing without relying on external models, addressing key challenges in previous methods. Through extensive evaluation on multiple benchmarks and multiple editing aspects (e.g., style transfer, multi-attribute editing, object removal, local editing, etc.), we show the effectiveness of our proposed method. Experimental results demonstrate that our proposed method outperforms state-of-the-art 4D editing methods in diverse benchmarks.

CVMay 30, 2023
SAVE: Spectral-Shift-Aware Adaptation of Image Diffusion Models for Text-driven Video Editing

Nazmul Karim, Umar Khalid, Mohsen Joneidi et al.

Text-to-Image (T2I) diffusion models have achieved remarkable success in synthesizing high-quality images conditioned on text prompts. Recent methods have tried to replicate the success by either training text-to-video (T2V) models on a very large number of text-video pairs or adapting T2I models on text-video pairs independently. Although the latter is computationally less expensive, it still takes a significant amount of time for per-video adaption. To address this issue, we propose SAVE, a novel spectral-shift-aware adaptation framework, in which we fine-tune the spectral shift of the parameter space instead of the parameters themselves. Specifically, we take the spectral decomposition of the pre-trained T2I weights and only update the singular values while freezing the corresponding singular vectors. In addition, we introduce a spectral shift regularizer aimed at placing tighter constraints on larger singular values compared to smaller ones. This form of regularization enables the model to grasp finer details within the video that align with the provided textual descriptions. We also offer theoretical justification for our proposed regularization technique. Since we are only dealing with spectral shifts, the proposed method reduces the adaptation time significantly (approx. 10 times) and has fewer resource constraints for training. Such attributes posit SAVE to be more suitable for real-world applications, e.g. editing undesirable content during video streaming. We validate the effectiveness of SAVE with an extensive experimental evaluation under different settings, e.g. style transfer, object replacement, privacy preservation, etc.

CVOct 9, 2021
Adversarial Training for Face Recognition Systems using Contrastive Adversarial Learning and Triplet Loss Fine-tuning

Nazmul Karim, Umar Khalid, Nick Meeker et al.

Though much work has been done in the domain of improving the adversarial robustness of facial recognition systems, a surprisingly small percentage of it has focused on self-supervised approaches. In this work, we present an approach that combines Ad-versarial Pre-Training with Triplet Loss AdversarialFine-Tuning. We compare our methods with the pre-trained ResNet50 model that forms the backbone of FaceNet, finetuned on our CelebA dataset. Through comparing adversarial robustness achieved without adversarial training, with triplet loss adversarial training, and our contrastive pre-training combined with triplet loss adversarial fine-tuning, we find that our method achieves comparable results with far fewer epochs re-quired during fine-tuning. This seems promising, increasing the training time for fine-tuning should yield even better results. In addition to this, a modified semi-supervised experiment was conducted, which demonstrated the improvement of contrastive adversarial training with the introduction of small amounts of labels.

CVJul 3, 2021
SPI-GAN: Towards Single-Pixel Imaging through Generative Adversarial Network

Nazmul Karim, Nazanin Rahnavard

Single-pixel imaging is a novel imaging scheme that has gained popularity due to its huge computational gain and potential for a low-cost alternative to imaging beyond the visible spectrum. The traditional reconstruction methods struggle to produce a clear recovery when one limits the number of illumination patterns from a spatial light modulator. As a remedy, several deep-learning-based solutions have been proposed which lack good generalization ability due to the architectural setup and loss functions. In this paper, we propose a generative adversarial network-based reconstruction framework for single-pixel imaging, referred to as SPI-GAN. Our method can reconstruct images with 17.92 dB PSNR and 0.487 SSIM, even if the sampling ratio drops to 5%. This facilitates much faster reconstruction making our method suitable for single-pixel video. Furthermore, our ResNet-like architecture for the generator leads to useful representation learning that allows us to reconstruct completely unseen objects. The experimental results demonstrate that SPI-GAN achieves significant performance gain, e.g. near 3dB PSNR gain, over the current state-of-the-art method.

LGJul 2, 2021
RL-NCS: Reinforcement learning based data-driven approach for nonuniform compressed sensing

Nazmul Karim, Alireza Zaeemzadeh, Nazanin Rahnavard

A reinforcement-learning-based non-uniform compressed sensing (NCS) framework for time-varying signals is introduced. The proposed scheme, referred to as RL-NCS, aims to boost the performance of signal recovery through an optimal and adaptive distribution of sensing energy among two groups of coefficients of the signal, referred to as the region of interest (ROI) coefficients and non-ROI coefficients. The coefficients in ROI usually have greater importance and need to be reconstructed with higher accuracy compared to non-ROI coefficients. In order to accomplish this task, the ROI is predicted at each time step using two specific approaches. One of these approaches incorporates a long short-term memory (LSTM) network for the prediction. The other approach employs the previous ROI information for predicting the next step ROI. Using the exploration-exploitation technique, a Q-network learns to choose the best approach for designing the measurement matrix. Furthermore, a joint loss function is introduced for the efficient training of the Q-network as well as the LSTM network. The result indicates a significant performance gain for our proposed method, even for rapidly varying signals and a reduced number of measurements.

CVJul 16, 2020
Odyssey: Creation, Analysis and Detection of Trojan Models

Marzieh Edraki, Nazmul Karim, Nazanin Rahnavard et al.

Along with the success of deep neural network (DNN) models, rise the threats to the integrity of these models. A recent threat is the Trojan attack where an attacker interferes with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger. Since the knowledge of triggers is privy to the attacker, detection of Trojan networks is challenging. Existing Trojan detectors make strong assumptions about the types of triggers and attacks. We propose a detector that is based on the analysis of the intrinsic DNN properties; that are affected due to the Trojaning process. For a comprehensive analysis, we develop Odysseus, the most diverse dataset to date with over 3,000 clean and Trojan models. Odysseus covers a large spectrum of attacks; generated by leveraging the versatility in trigger designs and source to target class mappings. Our analysis results show that Trojan attacks affect the classifier margin and shape of decision boundary around the manifold of clean data. Exploiting these two factors, we propose an efficient Trojan detector that operates without any knowledge of the attack and significantly outperforms existing methods. Through a comprehensive set of experiments we demonstrate the efficacy of the detector on cross model architectures, unseen Triggers and regularized models.