SENov 6, 2025Code
PEFA-AI: Advancing Open-source LLMs for RTL generation using Progressive Error Feedback Agentic-AIAthma Narayanan, Mahesh Subedar, Omesh Tickoo
We present an agentic flow consisting of multiple agents that combine specialized LLMs and hardware simulation tools to collaboratively complete the complex task of Register Transfer Level (RTL) generation without human intervention. A key feature of the proposed flow is the progressive error feedback system of agents (PEFA), a self-correcting mechanism that leverages iterative error feedback to progressively increase the complexity of the approach. The generated RTL includes checks for compilation, functional correctness, and synthesizable constructs. To validate this adaptive approach to code generation, benchmarking is performed using two opensource natural language-to-RTL datasets. We demonstrate the benefits of the proposed approach implemented on an open source agentic framework, using both open- and closed-source LLMs, effectively bridging the performance gap between them. Compared to previously published methods, our approach sets a new benchmark, providing state-of-the-art pass rates while being efficient in token counts.
LGAug 24, 2022
A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN ComputingParual Datta, Nilesh Ahuja, V. Srinivasa Somayazulu et al.
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads, wherein a DNN model is split into two parts, one of which is executed on a mobile/client device and the other on an edge-server (or cloud). Data compression is applied to the intermediate tensor from the DNN that needs to be transmitted, addressing the challenge of optimizing the rate-accuracy-complexity trade-off. Existing split-computing approaches adopt ML-based data compression, but require that the parameters of either the entire DNN model, or a significant portion of it, be retrained for different compression levels. This incurs a high computational and storage burden: training a full DNN model from scratch is computationally demanding, maintaining multiple copies of the DNN parameters increases storage requirements, and switching the full set of weights during inference increases memory bandwidth. In this paper, we present an approach that addresses all these challenges. It involves the systematic design and training of bottleneck units - simple, low-cost neural networks - that can be inserted at the point of split. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance at a small fraction of the compute and storage overhead compared to existing methods.
LGMar 20, 2022
Subspace Modeling for Fast Out-Of-Distribution and Anomaly DetectionIbrahima J. Ndiour, Nilesh A. Ahuja, Omesh Tickoo
This paper presents a fast, principled approach for detecting anomalous and out-of-distribution (OOD) samples in deep neural networks (DNN). We propose the application of linear statistical dimensionality reduction techniques on the semantic features produced by a DNN, in order to capture the low-dimensional subspace truly spanned by said features. We show that the "feature reconstruction error" (FRE), which is the $\ell_2$-norm of the difference between the original feature in the high-dimensional space and the pre-image of its low-dimensional reduced embedding, is highly effective for OOD and anomaly detection. To generalize to intermediate features produced at any given layer, we extend the methodology by applying nonlinear kernel-based methods. Experiments using standard image datasets and DNN architectures demonstrate that our method meets or exceeds best-in-class quality performance, but at a fraction of the computational and memory cost required by the state of the art. It can be trained and run very efficiently, even on a traditional CPU.
CVNov 23, 2022
FRE: A Fast Method For Anomaly Detection And SegmentationIbrahima Ndiour, Nilesh Ahuja, Utku Genc et al.
This paper presents a fast and principled approach for solving the visual anomaly detection and segmentation problem. In this setup, we have access to only anomaly-free training data and want to detect and identify anomalies of an arbitrary nature on test data. We propose the application of linear statistical dimensionality reduction techniques on the intermediate features produced by a pretrained DNN on the training data, in order to capture the low-dimensional subspace truly spanned by said features. We show that the \emph{feature reconstruction error} (FRE), which is the $\ell_2$-norm of the difference between the original feature in the high-dimensional space and the pre-image of its low-dimensional reduced embedding, is extremely effective for anomaly detection. Further, using the same feature reconstruction error concept on intermediate convolutional layers, we derive FRE maps that provide pixel-level spatial localization of the anomalies in the image (i.e. segmentation). Experiments using standard anomaly detection datasets and DNN architectures demonstrate that our method matches or exceeds best-in-class quality performance, but at a fraction of the computational and memory cost required by the state of the art. It can be trained and run very efficiently, even on a traditional CPU.
CVDec 9, 2022
Reliable Multimodal Trajectory Prediction via Error Aligned Uncertainty OptimizationNeslihan Kose, Ranganath Krishnan, Akash Dhamasia et al.
Reliable uncertainty quantification in deep neural networks is very crucial in safety-critical applications such as automated driving for trustworthy and informed decision-making. Assessing the quality of uncertainty estimates is challenging as ground truth for uncertainty estimates is not available. Ideally, in a well-calibrated model, uncertainty estimates should perfectly correlate with model error. We propose a novel error aligned uncertainty optimization method and introduce a trainable loss function to guide the models to yield good quality uncertainty estimates aligning with the model error. Our approach targets continuous structured prediction and regression tasks, and is evaluated on multiple datasets including a large-scale vehicle motion prediction task involving real-world distributional shifts. We demonstrate that our method improves average displacement error by 1.69% and 4.69%, and the uncertainty correlation with model error by 17.22% and 19.13% as quantified by Pearson correlation coefficient on two state-of-the-art baselines.
LGDec 13, 2024Code
CONCLAD: COntinuous Novel CLAss DetectorAmanda Rios, Ibrahima Ndiour, Parual Datta et al.
In the field of continual learning, relying on so-called oracles for novelty detection is commonplace albeit unrealistic. This paper introduces CONCLAD ("COntinuous Novel CLAss Detector"), a comprehensive solution to the under-explored problem of continual novel class detection in post-deployment data. At each new task, our approach employs an iterative uncertainty estimation algorithm to differentiate between known and novel class(es) samples, and to further discriminate between the different novel classes themselves. Samples predicted to be from a novel class with high-confidence are automatically pseudo-labeled and used to update our model. Simultaneously, a tiny supervision budget is used to iteratively query ambiguous novel class predictions, which are also used during update. Evaluation across multiple datasets, ablations and experimental settings demonstrate our method's effectiveness at separating novel and old class samples continuously. We will release our code upon acceptance.
LGDec 12, 2024Code
CUAL: Continual Uncertainty-aware Active LearnerAmanda Rios, Ibrahima Ndiour, Parual Datta et al.
AI deployed in many real-world use cases should be capable of adapting to novelties encountered after deployment. Here, we consider a challenging, under-explored and realistic continual adaptation problem: a deployed AI agent is continuously provided with unlabeled data that may contain not only unseen samples of known classes but also samples from novel (unknown) classes. In such a challenging setting, it has only a tiny labeling budget to query the most informative samples to help it continuously learn. We present a comprehensive solution to this complex problem with our model "CUAL" (Continual Uncertainty-aware Active Learner). CUAL leverages an uncertainty estimation algorithm to prioritize active labeling of ambiguous (uncertain) predicted novel class samples while also simultaneously pseudo-labeling the most certain predictions of each class. Evaluations across multiple datasets, ablations, settings and backbones (e.g. ViT foundation model) demonstrate our method's effectiveness. We will release our code upon acceptance.
CVMay 4
Synthetic Data Generation for Long-Tail Medical Image Classification: A Case Study in Skin LesionsJiaxiang Jiang, Mahesh Subedar, Omesh Tickoo
Long-tailed class distributions are pervasive in multi-class medical datasets and pose significant challenges for deep learning models which typically underperform on tail classes with limited samples. This limitation is particularly problematic in medical applications, where rare classes often correspond to severe or high-risk diseases and therefore require high diagnostic accuracy. Existing solutions-including specialized architectures, rebalanced loss functions, and handcrafted data augmentation-offer only marginal improvements and struggle to scale due to their limited and largely deterministic variability. To address these challenges, we introduce a diffusion-model-driven synthetic data augmentation pipeline tailored for medical long-tailed classification. Our approach features a novel inpainting diffusion model combined with an Out-of-Distribution (OOD) post-selection mechanism to ensure diverse, realistic, and clinically meaningful synthetic samples. Evaluated on the ISIC2019 skin lesion classification dataset, one of the largest and most imbalanced medical imaging benchmarks, our method yields substantial improvements in overall performance, with particularly pronounced gains on tail classes with more than $28\%$ improvement on the class with the fewest samples. These results demonstrate the effectiveness of diffusion-based augmentation in mitigating long-tail imbalance and enhancing medical classification robustness.
CLDec 3, 2024
Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-TuningRanganath Krishnan, Piyush Khanna, Omesh Tickoo
Large language models (LLMs) have revolutionized the field of natural language processing with their impressive reasoning and question-answering capabilities. However, these models are sometimes prone to generating credible-sounding but incorrect information, a phenomenon known as LLM hallucinations. Reliable uncertainty estimation in LLMs is essential for fostering trust in their generated responses and serves as a critical tool for the detection and prevention of erroneous or hallucinated outputs. To achieve reliable and well-calibrated uncertainty quantification in open-ended and free-form natural language generation, we propose an uncertainty-aware fine-tuning approach for LLMs. This approach enhances the model's ability to provide reliable uncertainty estimates without compromising accuracy, thereby guiding them to produce more trustworthy responses. We introduce a novel uncertainty-aware causal language modeling loss function, grounded in the principles of decision theory. Through rigorous evaluation on multiple free-form question-answering datasets and models, we demonstrate that our uncertainty-aware fine-tuning approach yields better calibrated uncertainty estimates in natural language generation tasks than fine-tuning with the standard causal language modeling loss. Furthermore, the experimental results show that the proposed method significantly improves the model's ability to detect hallucinations and identify out-of-domain prompts.
CVOct 26, 2025
Windsock is Dancing: Adaptive Multimodal Retrieval-Augmented GenerationShu Zhao, Tianyi Shen, Nilesh Ahuja et al.
Multimodal Retrieval-Augmented Generation (MRAG) has emerged as a promising method to generate factual and up-to-date responses of Multimodal Large Language Models (MLLMs) by incorporating non-parametric knowledge from external knowledge bases. However, existing MRAG approaches suffer from static retrieval strategies, inflexible modality selection, and suboptimal utilization of retrieved information, leading to three critical challenges: determining when to retrieve, what modality to incorporate, and how to utilize retrieved information effectively. To address these challenges, we introduce Windsock, a query-dependent module making decisions on retrieval necessity and modality selection, effectively reducing computational overhead and improving response quality. Additionally, we propose Dynamic Noise-Resistance (DANCE) Instruction Tuning, an adaptive training strategy that enhances MLLMs' ability to utilize retrieved information while maintaining robustness against noise. Moreover, we adopt a self-assessment approach leveraging knowledge within MLLMs to convert question-answering datasets to MRAG training datasets. Extensive experiments demonstrate that our proposed method significantly improves the generation quality by 17.07% while reducing 8.95% retrieval times.
LGSep 19, 2025
EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMsDavide Ettori, Nastaran Darabi, Sina Tayebati et al.
Large language models (LLMs) offer broad utility but remain prone to hallucination and out-of-distribution (OOD) errors. We propose EigenTrack, an interpretable real-time detector that uses the spectral geometry of hidden activations, a compact global signature of model dynamics. By streaming covariance-spectrum statistics such as entropy, eigenvalue gaps, and KL divergence from random baselines into a lightweight recurrent classifier, EigenTrack tracks temporal shifts in representation structure that signal hallucination and OOD drift before surface errors appear. Unlike black- and grey-box methods, it needs only a single forward pass without resampling. Unlike existing white-box detectors, it preserves temporal context, aggregates global signals, and offers interpretable accuracy-latency trade-offs.
LGDec 21, 2024
Uncertainty Quantification in Continual Open-World LearningAmanda S. Rios, Ibrahima J. Ndiour, Parual Datta et al.
AI deployed in the real-world should be capable of autonomously adapting to novelties encountered after deployment. Yet, in the field of continual learning, the reliance on novelty and labeling oracles is commonplace albeit unrealistic. This paper addresses a challenging and under-explored problem: a deployed AI agent that continuously encounters unlabeled data - which may include both unseen samples of known classes and samples from novel (unknown) classes - and must adapt to it continuously. To tackle this challenge, we propose our method COUQ "Continual Open-world Uncertainty Quantification", an iterative uncertainty estimation algorithm tailored for learning in generalized continual open-world multi-class settings. We rigorously apply and evaluate COUQ on key sub-tasks in the Continual Open-World: continual novelty detection, uncertainty guided active learning, and uncertainty guided pseudo-labeling for semi-supervised CL. We demonstrate the effectiveness of our method across multiple datasets, ablations, backbones and performance superior to state-of-the-art.
CVNov 21, 2025
Parts-Mamba: Augmenting Joint Context with Part-Level Scanning for Occluded Human SkeletonTianyi Shen, Huijuan Xu, Nilesh Ahuja et al.
Skeleton action recognition involves recognizing human action from human skeletons. The use of graph convolutional networks (GCNs) has driven major advances in this recognition task. In real-world scenarios, the captured skeletons are not always perfect or complete because of occlusions of parts of the human body or poor communication quality, leading to missing parts in skeletons or videos with missing frames. In the presence of such non-idealities, existing GCN models perform poorly due to missing local context. To address this limitation, we propose Parts-Mamba, a hybrid GCN-Mamba model designed to enhance the ability to capture and maintain contextual information from distant joints. The proposed Parts-Mamba model effectively captures part-specific information through its parts-specific scanning feature and preserves non-neighboring joint context via a parts-body fusion module. Our proposed model is evaluated on the NTU RGB+D 60 and NTU RGB+D 120 datasets under different occlusion settings, achieving up to 12.9% improvement in accuracy.
LGSep 13, 2021
Robust Contrastive Active Learning with Feature-guided Query StrategiesRanganath Krishnan, Nilesh Ahuja, Alok Sinha et al.
We introduce supervised contrastive active learning (SCAL) and propose efficient query strategies in active learning based on the feature similarity (featuresim) and principal component analysis based feature-reconstruction error (fre) to select informative data samples with diverse feature representations. We demonstrate our proposed method achieves state-of-the-art accuracy, model calibration and reduces sampling bias in an active learning setup for balanced and imbalanced datasets on image classification tasks. We also evaluate robustness of model to distributional shift derived from different query strategies in active learning setting. Using extensive experiments, we show that our proposed approach outperforms high performing compute-intensive methods by a big margin resulting in 9.9% lower mean corruption error, 7.2% lower expected calibration error under dataset shift and 8.9% higher AUROC for out-of-distribution detection.
LGSep 13, 2021
Mitigating Sampling Bias and Improving Robustness in Active LearningRanganath Krishnan, Alok Sinha, Nilesh Ahuja et al.
This paper presents simple and efficient methods to mitigate sampling bias in active learning while achieving state-of-the-art accuracy and model robustness. We introduce supervised contrastive active learning by leveraging the contrastive loss for active learning under a supervised setting. We propose an unbiased query strategy that selects informative data samples of diverse feature representations with our methods: supervised contrastive active learning (SCAL) and deep feature modeling (DFM). We empirically demonstrate our proposed methods reduce sampling bias, achieve state-of-the-art accuracy and model calibration in an active learning setup with the query computation 26x faster than Bayesian active learning by disagreement and 11x faster than CoreSet. The proposed SCAL method outperforms by a big margin in robustness to dataset shift and out-of-distribution.
CVSep 10, 2021
Partially-Supervised Novel Object Captioning Leveraging Context from Paired DataShashank Bujimalla, Mahesh Subedar, Omesh Tickoo
In this paper, we propose an approach to improve image captioning solution for images with novel objects that do not have caption labels in the training dataset. We refer to our approach as Partially-Supervised Novel Object Captioning (PS-NOC). PS-NOC is agnostic to model architecture, and primarily focuses on the training approach that uses existing fully paired image-caption data and the images with only the novel object detection labels (partially paired data). We create synthetic paired captioning data for novel objects by leveraging context from existing image-caption pairs. We then create pseudo-label captions for partially paired images with novel objects, and use this additional data to fine-tune the captioning model. We also propose a variant of SCST within PS-NOC, called SCST-F1, that directly optimizes the F1-score of novel objects. Using a popular captioning model (Up-Down) as baseline, PS-NOC sets new state-of-the-art results on held-out MS COCO out-of-domain test split, i.e., 85.9 F1-score and 103.8 CIDEr. This is an improvement of 85.9 and 34.1 points respectively compared to baseline model that does not use partially paired data during training. We also perform detailed ablation studies to demonstrate the effectiveness of our approach.
CLJun 10, 2021
Data augmentation to improve robustness of image captioning solutionsShashank Bujimalla, Mahesh Subedar, Omesh Tickoo
In this paper, we study the impact of motion blur, a common quality flaw in real world images, on a state-of-the-art two-stage image captioning solution, and notice a degradation in solution performance as blur intensity increases. We investigate techniques to improve the robustness of the solution to motion blur using training data augmentation at each or both stages of the solution, i.e., object detection and captioning, and observe improved results. In particular, augmenting both the stages reduces the CIDEr-D degradation for high motion blur intensity from 68.7 to 11.7 on MS COCO dataset, and from 22.4 to 6.8 on Vizwiz dataset.
LGMay 7, 2021
Energy-Based Anomaly Detection and LocalizationErgin Utku Genc, Nilesh Ahuja, Ibrahima J Ndiour et al.
This brief sketches initial progress towards a unified energy-based solution for the semi-supervised visual anomaly detection and localization problem. In this setup, we have access to only anomaly-free training data and want to detect and identify anomalies of an arbitrary nature on test data. We employ the density estimates from the energy-based model (EBM) as normalcy scores that can be used to discriminate normal images from anomalous ones. Further, we back-propagate the gradients of the energy score with respect to the image in order to generate a gradient map that provides pixel-level spatial localization of the anomalies in the image. In addition to the spatial localization, we show that simple processing of the gradient map can also provide alternative normalcy scores that either match or surpass the detection performance obtained with the energy value. To quantitatively validate the performance of the proposed method, we conduct experiments on the MVTec industrial dataset. Though still preliminary, our results are very promising and reveal the potential of EBMs for simultaneously detecting and localizing unforeseen anomalies in images.
LGDec 14, 2020
Improving model calibration with accuracy versus uncertainty optimizationRanganath Krishnan, Omesh Tickoo
Obtaining reliable and accurate quantification of uncertainty estimates from deep neural networks is important in safety-critical applications. A well-calibrated model should be accurate when it is certain about its prediction and indicate high uncertainty when it is likely to be inaccurate. Uncertainty calibration is a challenging problem as there is no ground truth available for uncertainty estimates. We propose an optimization method that leverages the relationship between accuracy and uncertainty as an anchor for uncertainty calibration. We introduce a differentiable accuracy versus uncertainty calibration (AvUC) loss function that allows a model to learn to provide well-calibrated uncertainties, in addition to improved accuracy. We also demonstrate the same methodology can be extended to post-hoc uncertainty calibration on pretrained models. We illustrate our approach with mean-field stochastic variational inference and compare with state-of-the-art methods. Extensive experiments demonstrate our approach yields better model calibration than existing methods on large-scale image classification tasks under distributional shift.
LGDec 8, 2020
Out-Of-Distribution Detection With Subspace Techniques And Probabilistic Modeling Of FeaturesIbrahima Ndiour, Nilesh Ahuja, Omesh Tickoo
This paper presents a principled approach for detecting out-of-distribution (OOD) samples in deep neural networks (DNN). Modeling probability distributions on deep features has recently emerged as an effective, yet computationally cheap method to detect OOD samples in DNN. However, the features produced by a DNN at any given layer do not fully occupy the corresponding high-dimensional feature space. We apply linear statistical dimensionality reduction techniques and nonlinear manifold-learning techniques on the high-dimensional features in order to capture the true subspace spanned by the features. We hypothesize that such lower-dimensional feature embeddings can mitigate the curse of dimensionality, and enhance any feature-based method for more efficient and effective performance. In the context of uncertainty estimation and OOD, we show that the log-likelihood score obtained from the distributions learnt on this lower-dimensional subspace is more discriminative for OOD detection. We also show that the feature reconstruction error, which is the $L_2$-norm of the difference between the original feature and the pre-image of its embedding, is highly effective for OOD detection and in some cases superior to the log-likelihood scores. The benefits of our approach are demonstrated on image features by detecting OOD images, using popular DNN architectures on commonly used image datasets such as CIFAR10, CIFAR100, and SVHN.
CYNov 15, 2020
Uncertainty as a Form of Transparency: Measuring, Communicating, and Using UncertaintyUmang Bhatt, Javier Antorán, Yunfeng Zhang et al.
Algorithmic transparency entails exposing system properties to various stakeholders for purposes that include understanding, improving, and contesting predictions. Until now, most research into algorithmic transparency has predominantly focused on explainability. Explainability attempts to provide reasons for a machine learning model's behavior to stakeholders. However, understanding a model's specific behavior alone might not be enough for stakeholders to gauge whether the model is wrong or lacks sufficient knowledge to solve the task at hand. In this paper, we argue for considering a complementary form of transparency by estimating and communicating the uncertainty associated with model predictions. First, we discuss methods for assessing uncertainty. Then, we characterize how uncertainty can be used to mitigate model unfairness, augment decision-making, and build trustworthy systems. Finally, we outline methods for displaying uncertainty to stakeholders and recommend how to collect information required for incorporating uncertainty into existing ML pipelines. This work constitutes an interdisciplinary review drawn from literature spanning machine learning, visualization/HCI, design, decision-making, and fairness. We aim to encourage researchers and practitioners to measure, communicate, and use uncertainty as a form of transparency.
LGApr 6, 2020
B-SCST: Bayesian Self-Critical Sequence Training for Image CaptioningShashank Bujimalla, Mahesh Subedar, Omesh Tickoo
Bayesian deep neural networks (DNNs) can provide a mathematically grounded framework to quantify uncertainty in predictions from image captioning models. We propose a Bayesian variant of policy-gradient based reinforcement learning training technique for image captioning models to directly optimize non-differentiable image captioning quality metrics such as CIDEr-D. We extend the well-known Self-Critical Sequence Training (SCST) approach for image captioning models by incorporating Bayesian inference, and refer to it as B-SCST. The "baseline" for the policy-gradients in B-SCST is generated by averaging predictive quality metrics (CIDEr-D) of the captions drawn from the distribution obtained using a Bayesian DNN model. We infer this predictive distribution using Monte Carlo (MC) dropout approximate variational inference. We show that B-SCST improves CIDEr-D scores on Flickr30k, MS COCO and VizWiz image captioning datasets, compared to the SCST approach. We also provide a study of uncertainty quantification for the predicted captions, and demonstrate that it correlates well with the CIDEr-D scores. To our knowledge, this is the first such analysis, and it can improve the interpretability of image captioning model outputs, which is critical for practical applications.
MLDec 18, 2019
Tree pyramidal adaptive importance samplingJavier Felip, Nilesh Ahuja, Omesh Tickoo
This paper introduces Tree-Pyramidal Adaptive Importance Sampling (TP-AIS), a novel iterated sampling method that outperforms state-of-the-art approaches like deterministic mixture population Monte Carlo (DM-PMC), mixture population Monte Carlo (M-PMC) and layered adaptive importance sampling (LAIS). TP-AIS iteratively builds a proposal distribution parameterized by a tree pyramid, where each tree leaf spans a subspace that represents its importance density. After each new sample operation, a set of tree leaves are subdivided for improving the approximation of the proposal distribution to the target density. Unlike the rest of the methods in the literature, TP-AIS is parameter free and requires no tuning to achieve its best performance. We evaluate TP-AIS with different complexity randomized target probability density functions (PDF) and also analyze its application to different dimensions. The results are compared to state-of-the-art iterative importance sampling approaches and other baseline MCMC approaches using Normalized Effective Sample Size (N-ESS), Jensen-Shannon Divergence, and time complexity.
LGDec 3, 2019
Deep Probabilistic Models to Detect Data Poisoning AttacksMahesh Subedar, Nilesh Ahuja, Ranganath Krishnan et al.
Data poisoning attacks compromise the integrity of machine-learning models by introducing malicious training samples to influence the results during test time. In this work, we investigate backdoor data poisoning attack on deep neural networks (DNNs) by inserting a backdoor pattern in the training images. The resulting attack will misclassify poisoned test samples while maintaining high accuracies for the clean test-set. We present two approaches for detection of such poisoned samples by quantifying the uncertainty estimates associated with the trained models. In the first approach, we model the outputs of the various layers (deep features) with parametric probability distributions learnt from the clean held-out dataset. At inference, the likelihoods of deep features w.r.t these distributions are calculated to derive uncertainty estimates. In the second approach, we use Bayesian deep neural networks trained with mean-field variational inference to estimate model uncertainty associated with the predictions. The uncertainty estimates from these methods are used to discriminate clean from the poisoned samples.
MLSep 25, 2019
Probabilistic Modeling of Deep Features for Out-of-Distribution and Adversarial DetectionNilesh A. Ahuja, Ibrahima Ndiour, Trushant Kalyanpur et al.
We present a principled approach for detecting out-of-distribution (OOD) and adversarial samples in deep neural networks. Our approach consists in modeling the outputs of the various layers (deep features) with parametric probability distributions once training is completed. At inference, the likelihoods of the deep features w.r.t the previously learnt distributions are calculated and used to derive uncertainty estimates that can discriminate in-distribution samples from OOD samples. We explore the use of two classes of multivariate distributions for modeling the deep features - Gaussian and Gaussian mixture - and study the trade-off between accuracy and computational complexity. We demonstrate benefits of our approach on image features by detecting OOD images and adversarially-generated images, using popular DNN architectures on MNIST and CIFAR10 datasets. We show that more precise modeling of the feature distributions result in significantly improved detection of OOD and adversarial samples; up to 12 percentage points in AUPR and AUROC metrics. We further show that our approach remains extremely effective when applied to video data and associated spatio-temporal features by detecting adversarial samples on activity classification tasks using UCF101 dataset, and the C3D network. To our knowledge, our methodology is the first one reported for reliably detecting white-box adversarial framing, a state-of-the-art adversarial attack for video classifiers.
NEJun 12, 2019
Specifying Weight Priors in Bayesian Deep Neural Networks with Empirical BayesRanganath Krishnan, Mahesh Subedar, Omesh Tickoo
Stochastic variational inference for Bayesian deep neural network (DNN) requires specifying priors and approximate posterior distributions over neural network weights. Specifying meaningful weight priors is a challenging problem, particularly for scaling variational inference to deeper architectures involving high dimensional weight space. We propose MOdel Priors with Empirical Bayes using DNN (MOPED) method to choose informed weight priors in Bayesian neural networks. We formulate a two-stage hierarchical modeling, first find the maximum likelihood estimates of weights with DNN, and then set the weight priors using empirical Bayes approach to infer the posterior with variational inference. We empirically evaluate the proposed approach on real-world tasks including image classification, video activity recognition and audio classification with varying complex neural network architectures. We also evaluate our proposed approach on diabetic retinopathy diagnosis task and benchmark with the state-of-the-art Bayesian deep learning techniques. We demonstrate MOPED method enables scalable variational inference and provides reliable uncertainty quantification.
CVMay 22, 2019
Real-time Approximate Bayesian Computation for Scene UnderstandingJavier Felip, Nilesh Ahuja, David Gómez-Gutiérrez et al.
Consider scene understanding problems such as predicting where a person is probably reaching, or inferring the pose of 3D objects from depth images, or inferring the probable street crossings of pedestrians at a busy intersection. This paper shows how to solve these problems using Approximate Bayesian Computation. The underlying generative models are built from realistic simulation software, wrapped in a Bayesian error model for the gap between simulation outputs and real data. The simulators are drawn from off-the-shelf computer graphics, video game, and traffic simulation code. The paper introduces two techniques for speeding up inference that can be used separately or in combination. The first is to train neural surrogates of the simulators, using a simple form of domain randomization to make the surrogates more robust to the gap between the simulation and reality. The second is to adaptively discretize the latent variables using a Tree-pyramid approach adapted from computer graphics. This paper also shows performance and accuracy measurements on real-world problems, establishing that it is feasible to solve these problems in real-time.
NENov 27, 2018
Uncertainty aware audiovisual activity recognition using deep Bayesian variational inferenceMahesh Subedar, Ranganath Krishnan, Paulo Lopez Meyer et al.
Deep neural networks (DNNs) provide state-of-the-art results for a multitude of applications, but the approaches using DNNs for multimodal audiovisual applications do not consider predictive uncertainty associated with individual modalities. Bayesian deep learning methods provide principled confidence and quantify predictive uncertainty. Our contribution in this work is to propose an uncertainty aware multimodal Bayesian fusion framework for activity recognition. We demonstrate a novel approach that combines deterministic and variational layers to scale Bayesian DNNs to deeper architectures. Our experiments using in- and out-of-distribution samples selected from a subset of Moments-in-Time (MiT) dataset show a more reliable confidence measure as compared to the non-Bayesian baseline and the Monte Carlo dropout (MC dropout) approximate Bayesian inference. We also demonstrate the uncertainty estimates obtained from the proposed framework can identify out-of-distribution data on the UCF101 and MiT datasets. In the multimodal setting, the proposed framework improved precision-recall AUC by 10.2% on the subset of MiT dataset as compared to non-Bayesian baseline.
NENov 8, 2018
BAR: Bayesian Activity Recognition using variational inferenceRanganath Krishnan, Mahesh Subedar, Omesh Tickoo
Uncertainty estimation in deep neural networks is essential for designing reliable and robust AI systems. Applications such as video surveillance for identifying suspicious activities are designed with deep neural networks (DNNs), but DNNs do not provide uncertainty estimates. Capturing reliable uncertainty estimates in safety and security critical applications will help to establish trust in the AI system. Our contribution is to apply Bayesian deep learning framework to visual activity recognition application and quantify model uncertainty along with principled confidence. We utilize the stochastic variational inference technique while training the Bayesian DNNs to infer the approximate posterior distribution around model parameters and perform Monte Carlo sampling on the posterior of model parameters to obtain the predictive distribution. We show that the Bayesian inference applied to DNNs provide reliable confidence measures for visual activity recognition task as compared to conventional DNNs. We also show that our method improves the visual activity recognition precision-recall AUC by 6.2% compared to non-Bayesian baseline. We evaluate our models on Moments-In-Time (MiT) activity recognition dataset by selecting a subset of in- and out-of-distribution video samples.
CVAug 30, 2017
A Greedy Part Assignment Algorithm for Real-time Multi-person 2D Pose EstimationSrenivas Varadarajan, Parual Datta, Omesh Tickoo
Human pose-estimation in a multi-person image involves detection of various body parts and grouping them into individual person clusters. While the former task is challenging due to mutual occlusions, the combinatorial complexity of the latter task is very high. We propose a greedy part assignment algorithm that exploits the inherent structure of the human body to achieve a lower complexity, compared to any of the prior published works. This is accomplished by (i) reducing the number of part-candidates using the estimated number of people in the image, (ii) doing a greedy sequential assignment of part-classes, following the kinematic chain from head to ankle (iii) doing a greedy assignment of parts in each part-class set, to person-clusters (iv) limiting the candidate person clusters to the most proximal clusters using human anthropometric data and (v) using only a specific subset of pre-assigned parts for establishing pairwise structural constraints. We show that, these steps result in a sparse body parts relationship graph and reduces the complexity. We also propose methods for improving the accuracy of pose-estimation by (i) spawning person-clusters from any unassigned significant body part and (ii) suppressing hallucinated parts. On the MPII multi-person pose database, pose-estimation using the proposed method takes only 0.14 seconds per image. We show that, our proposed algorithm, by using a large spatial and structural context, achieves the state-of-the-art accuracy on both MPII and WAF multi-person pose datasets, demonstrating the robustness of our approach.