Pradeep Shenoy

LG
h-index117
22papers
3,545citations
Novelty57%
AI Score43

22 Papers

LGDec 14, 2022
Interactive Concept Bottleneck Models

Kushal Chauhan, Rishabh Tiwari, Jan Freyberg et al. · berkeley

Concept bottleneck models (CBMs) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task, and then predict the final label based on the concept label predictions. We extend CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction. We demonstrate that a simple policy combining concept prediction uncertainty and influence of the concept on the final prediction achieves strong performance and outperforms static approaches as well as active feature acquisition methods proposed in the literature. We show that the interactive CBM can achieve accuracy gains of 5-10% with only 5 interactions over competitive baselines on the Caltech-UCSD Birds, CheXpert and OAI datasets.

LGOct 28, 2023
Using Early Readouts to Mediate Featural Bias in Distillation

Rishabh Tiwari, Durga Sivasubramanian, Anmol Mekala et al. · berkeley

Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. This vulnerability is aggravated in distillation, where a student model may have lesser representational capacity than the corresponding teacher model. Often, knowledge of specific spurious correlations is used to reweight instances & rebalance the learning process. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers. We show that these early readouts automatically identify problem instances or groups in the form of confident, incorrect predictions. Leveraging these signals to modulate the distillation loss on an instance level allows us to substantially improve not only group fairness measures across benchmark datasets, but also overall accuracy of the student model. We also provide secondary analyses that bring insight into the role of feature learning in supervision and distillation.

LGOct 7, 2022
Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components

Soumyabrata Pal, Prateek Varshney, Prateek Jain et al.

Personalization of machine learning (ML) predictions for individual users/domains/enterprises is critical for practical recommendation systems. Standard personalization approaches involve learning a user/domain specific embedding that is fed into a fixed global model which can be limiting. On the other hand, personalizing/fine-tuning model itself for each user/domain -- a.k.a meta-learning -- has high storage/infrastructure cost. Moreover, rigorous theoretical studies of scalable personalization approaches have been very limited. To address the above issues, we propose a novel meta-learning style approach that models network weights as a sum of low-rank and sparse components. This captures common information from multiple individuals/users together in the low-rank part while sparse part captures user-specific idiosyncrasies. We then study the framework in the linear setting, where the problem reduces to that of estimating the sum of a rank-$r$ and a $k$-column sparse matrix using a small number of linear measurements. We propose a computationally efficient alternating minimization method with iterative hard thresholding -- AMHT-LRS -- to learn the low-rank and sparse part. Theoretically, for the realizable Gaussian data setting, we show that AMHT-LRS solves the problem efficiently with nearly optimal sample complexity. Finally, a significant challenge in personalization is ensuring privacy of each user's sensitive data. We alleviate this problem by proposing a differentially private variant of our method that also is equipped with strong generalization guarantees.

LGJun 12, 2022
Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning

Kushal Chauhan, Soumya Chatterjee, Akash Reddy et al.

The options framework in Hierarchical Reinforcement Learning breaks down overall goals into a combination of options or simpler tasks and associated policies, allowing for abstraction in the action space. Ideally, these options can be reused across different higher-level goals; indeed, such reuse is necessary to realize the vision of a continual learning agent that can effectively leverage its prior experience. Previous approaches have only proposed limited forms of transfer of prelearned options to new task settings. We propose a novel option indexing approach to hierarchical learning (OI-HRL), where we learn an affinity function between options and the items present in the environment. This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time, by restricting goal-directed learning to only those options relevant to the task at hand. We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems, by incorporating feedback about the relevance of retrieved options to the higher-level goal. We evaluate OI-HRL in two simulated settings - the CraftWorld and AI2THOR environments - and show that we achieve performance competitive with oracular baselines, and substantial gains over a baseline that has the entire option pool available for learning the hierarchical policy.

LGDec 12, 2022
Instance-Conditional Timescales of Decay for Non-Stationary Learning

Nishant Jain, Pradeep Shenoy

Slow concept drift is a ubiquitous, yet under-studied problem in practical machine learning systems. In such settings, although recent data is more indicative of future data, naively prioritizing recent instances runs the risk of losing valuable information from the past. We propose an optimization-driven approach towards balancing instance importance over large training windows. First, we model instance relevance using a mixture of multiple timescales of decay, allowing us to capture rich temporal trends. Second, we learn an auxiliary scorer model that recovers the appropriate mixture of timescales as a function of the instance itself. Finally, we propose a nested optimization objective for learning the scorer, by which it maximizes forward transfer for the learned model. Experiments on a large real-world dataset of 39M photos over a 9 year period show upto 15% relative gains in accuracy compared to other robust learning baselines. We replicate our gains on two collections of real-world datasets for non-stationary learning, and extend our work to continual learning settings where, too, we beat SOTA methods by large margins.

LGJan 30, 2023
Overcoming Simplicity Bias in Deep Networks using a Feature Sieve

Rishabh Tiwari, Pradeep Shenoy

Simplicity bias is the concerning tendency of deep networks to over-depend on simple, weakly predictive features, to the exclusion of stronger, more complex features. This is exacerbated in real-world applications by limited training data and spurious feature-label correlations, leading to biased, incorrect predictions. We propose a direct, interventional method for addressing simplicity bias in DNNs, which we call the feature sieve. We aim to automatically identify and suppress easily-computable spurious features in lower layers of the network, thereby allowing the higher network levels to extract and utilize richer, more meaningful representations. We provide concrete evidence of this differential suppression & enhancement of relevant features on both controlled datasets and real-world images, and report substantial gains on many real-world debiasing benchmarks (11.4% relative gain on Imagenet-A; 3.2% on BAR, etc). Crucially, we do not depend on prior knowledge of spurious attributes or features, and in fact outperform many baselines that explicitly incorporate such information. We believe that our feature sieve work opens up exciting new research directions in automated adversarial feature extraction and representation learning for deep networks.

CVNov 28, 2023
Rescuing referral failures during automated diagnosis of domain-shifted medical images

Anuj Srivastava, Karm Patel, Pradeep Shenoy et al.

The success of deep learning models deployed in the real world depends critically on their ability to generalize well across diverse data domains. Here, we address a fundamental challenge with selective classification during automated diagnosis with domain-shifted medical images. In this scenario, models must learn to avoid making predictions when label confidence is low, especially when tested with samples far removed from the training set (covariate shift). Such uncertain cases are typically referred to the clinician for further analysis and evaluation. Yet, we show that even state-of-the-art domain generalization approaches fail severely during referral when tested on medical images acquired from a different demographic or using a different technology. We examine two benchmark diagnostic medical imaging datasets exhibiting strong covariate shifts: i) diabetic retinopathy prediction with retinal fundus images and ii) multilabel disease prediction with chest X-ray images. We show that predictive uncertainty estimates do not generalize well under covariate shifts leading to non-monotonic referral curves, and severe drops in performance (up to 50%) at high referral rates (>70%). We evaluate novel combinations of robust generalization and post hoc referral approaches, that rescue these failures and achieve significant performance improvements, typically >10%, over baseline methods. Our study identifies a critical challenge with referral in domain-shifted medical images and finds key applications in reliable, automated disease diagnosis.

CVNov 14, 2022
Robustifying Deep Vision Models Through Shape Sensitization

Aditay Tripathi, Rishubh Singh, Anirban Chakraborty et al.

Recent work has shown that deep vision models tend to be overly dependent on low-level or "texture" features, leading to poor generalization. Various data augmentation strategies have been proposed to overcome this so-called texture bias in DNNs. We propose a simple, lightweight adversarial augmentation technique that explicitly incentivizes the network to learn holistic shapes for accurate prediction in an object classification setting. Our augmentations superpose edgemaps from one image onto another image with shuffled patches, using a randomly determined mixing proportion, with the image label of the edgemap image. To classify these augmented images, the model needs to not only detect and focus on edges but distinguish between relevant and spurious edges. We show that our augmentations significantly improve classification accuracy and robustness measures on a range of datasets and neural architectures. As an example, for ViT-S, We obtain absolute gains on classification accuracy gains up to 6%. We also obtain gains of up to 28% and 8.5% on natural adversarial and out-of-distribution datasets like ImageNet-A (for ViT-B) and ImageNet-R (for ViT-S), respectively. Analysis using a range of probe datasets shows substantially increased shape sensitivity in our trained models, explaining the observed improvement in robustness and classification accuracy.

LGAug 29, 2022
Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++

Barath Mohan Umapathi, Kushal Chauhan, Pradeep Shenoy et al.

Reliable outlier detection is critical for real-world deployment of deep learning models. Although extensively studied, likelihoods produced by deep generative models have been largely dismissed as being impractical for outlier detection. First, deep generative model likelihoods are readily biased by low-level input statistics. Second, many recent solutions for correcting these biases are computationally expensive, or do not generalize well to complex, natural datasets. Here, we explore outlier detection with a state-of-the-art deep autoregressive model: PixelCNN++. We show that biases in PixelCNN++ likelihoods arise primarily from predictions based on local dependencies. We propose two families of bijective transformations -- ``stirring'' and ``shaking'' -- which ameliorate low-level biases and isolate the contribution of long-range dependencies to PixelCNN++ likelihoods. These transformations are inexpensive and readily computed at evaluation time. We test our approaches extensively with five grayscale and six natural image datasets and show that they achieve or exceed state-of-the-art outlier detection, particularly on datasets with complex, natural images. We also show that our solutions work well with other types of generative models (generative flows and variational autoencoders) and that their efficacy is governed by each model's reliance on local dependencies. In sum, lightweight remedies suffice to achieve robust outlier detection on image data with deep generative models.

LGDec 12, 2022
Selective classification using a robust meta-learning approach

Nishant Jain, Karthikeyan Shanmugam, Pradeep Shenoy

Predictive uncertainty-a model's self awareness regarding its accuracy on an input-is key for both building robust models via training interventions and for test-time applications such as selective classification. We propose a novel instance-conditioned reweighting approach that captures predictive uncertainty using an auxiliary network and unifies these train- and test-time applications. The auxiliary network is trained using a meta-objective in a bilevel optimization framework. A key contribution of our proposal is the meta-objective of minimizing the dropout variance, an approximation of Bayesian Predictive uncertainty. We show in controlled experiments that we effectively capture the diverse specific notions of uncertainty through this meta-objective, while previous approaches only capture certain aspects. These results translate to significant gains in real-world settings-selective classification, label noise, domain adaptation, calibration-and across datasets-Imagenet, Cifar100, diabetic retinopathy, Camelyon, WILDs, Imagenet-C,-A,-R, Clothing1M, etc. For Diabetic Retinopathy, we see upto 3.4%/3.3% accuracy and AUC gains over SOTA in selective classification. We also improve upon large-scale pretrained models such as PLEX.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

CVNov 5, 2024Code
OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing

Pranav Gupta, Rishubh Singh, Pradeep Shenoy et al.

Multi-object multi-part scene segmentation is a challenging task whose complexity scales exponentially with part granularity and number of scene objects. To address the task, we propose a plug-and-play approach termed OLAF. First, we augment the input (RGB) with channels containing object-based structural cues (fg/bg mask, boundary edge mask). We propose a weight adaptation technique which enables regular (RGB) pre-trained models to process the augmented (5-channel) input in a stable manner during optimization. In addition, we introduce an encoder module termed LDF to provide low-level dense feature guidance. This assists segmentation, particularly for smaller parts. OLAF enables significant mIoU gains of $\mathbf{3.3}$ (Pascal-Parts-58), $\mathbf{3.5}$ (Pascal-Parts-108) over the SOTA model. On the most challenging variant (Pascal-Parts-201), the gain is $\mathbf{4.0}$. Experimentally, we show that OLAF's broad applicability enables gains across multiple architectures (CNN, U-Net, Transformer) and datasets. The code is available at olafseg.github.io

CLFeb 12, 2025
Universal Model Routing for Efficient LLM Inference

Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat et al.

Model routing is a simple technique for reducing the inference cost of large language models (LLMs), wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose UniRoute, a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. Based on this, we detail two effective instantiations of UniRoute, relying on cluster-based routing and a learned cluster map respectively. We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound. Experiments on a range of public benchmarks show the effectiveness of UniRoute in routing amongst more than 30 unseen LLMs.

LGMar 18, 2024
Improving Generalization via Meta-Learning on Hard Samples

Nishant Jain, Arun S. Suggala, Pradeep Shenoy

Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context.

CVFeb 1, 2025
Masked Generative Nested Transformers with Decode Time Scaling

Sahil Goyal, Debapriya Tula, Gagan Jain et al.

Recent advances in visual generation have made significant strides in producing content of exceptional quality. However, most methods suffer from a fundamental problem - a bottleneck of inference computational efficiency. Most of these algorithms involve multiple passes over a transformer model to generate tokens or denoise inputs. However, the model size is kept consistent throughout all iterations, which makes it computationally expensive. In this work, we aim to address this issue primarily through two key ideas - (a) not all parts of the generation process need equal compute, and we design a decode time model scaling schedule to utilize compute effectively, and (b) we can cache and reuse some of the computation. Combining these two ideas leads to using smaller models to process more tokens while large models process fewer tokens. These different-sized models do not increase the parameter size, as they share parameters. We rigorously experiment with ImageNet256$\times$256 , UCF101, and Kinetics600 to showcase the efficacy of the proposed method for image/video generation and frame prediction. Our experiments show that with almost $3\times$ less compute than baseline, our model obtains competitive performance.

LGMay 18, 2023
STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

Nathan Beck, Suraj Kothawade, Pradeep Shenoy et al.

Deep neural networks have consistently shown great performance in several real-world use cases like autonomous vehicles, satellite imaging, etc., effectively leveraging large corpora of labeled training data. However, learning unbiased models depends on building a dataset that is representative of a diverse range of realistic scenarios for a given task. This is challenging in many settings where data comes from high-volume streams, with each scenario occurring in random interleaved episodes at varying frequencies. We study realistic streaming settings where data instances arrive in and are sampled from an episodic multi-distributional data stream. Using submodular information measures, we propose STREAMLINE, a novel streaming active learning framework that mitigates scenario-driven slice imbalance in the working labeled data via a three-step procedure of slice identification, slice-aware budgeting, and data selection. We extensively evaluate STREAMLINE on real-world streaming scenarios for image classification and object detection tasks. We observe that STREAMLINE improves the performance on infrequent yet critical slices of the data over current baselines by up to $5\%$ in terms of accuracy on our image classification tasks and by up to $8\%$ in terms of mAP on our object detection tasks.

CVMar 30, 2022
FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing

Rishubh Singh, Pranav Gupta, Pradeep Shenoy et al.

Multi-object multi-part scene parsing is a challenging task which requires detecting multiple object classes in a scene and segmenting the semantic parts within each object. In this paper, we propose FLOAT, a factorized label space framework for scalable multi-object multi-part parsing. Our framework involves independent dense prediction of object category and part attributes which increases scalability and reduces task complexity compared to the monolithic label space counterpart. In addition, we propose an inference-time 'zoom' refinement technique which significantly improves segmentation quality, especially for smaller objects/parts. Compared to state of the art, FLOAT obtains an absolute improvement of 2.0% for mean IOU (mIOU) and 4.8% for segmentation quality IOU (sqIOU) on the Pascal-Part-58 dataset. For the larger Pascal-Part-108 dataset, the improvements are 2.1% for mIOU and 3.9% for sqIOU. We incorporate previously excluded part attributes and other minor parts of the Pascal-Part dataset to create the most comprehensive and challenging version which we dub Pascal-Part-201. FLOAT obtains improvements of 8.6% for mIOU and 7.5% for sqIOU on the new dataset, demonstrating its parsing effectiveness across a challenging diversity of objects and parts. The code and datasets are available at floatseg.github.io.

LGFeb 7, 2022
Adaptive Mixing of Auxiliary Losses in Supervised Learning

Durga Sivasubramanian, Ayush Maheshwari, Pradeep Shenoy et al.

In several supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective. For instance, knowledge distillation aims to mimic outputs of a powerful teacher model; similarly, in rule-based approaches, weak labeling information is provided by labeling functions which may be noisy rule-based approximations to true labels. We tackle the problem of learning to combine these losses in a principled manner. Our proposal, AMAL, uses a bi-level optimization criterion on validation data to learn optimal mixing weights, at an instance level, over the training data. We describe a meta-learning approach towards solving this bi-level objective and show how it can be applied to different scenarios in supervised learning. Experiments in a number of knowledge distillation and rule-denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. We empirically analyze our method and share insights into the mechanisms through which it provides performance gains.

LGNov 18, 2021
GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning

Rishabh Tiwari, Krishnateja Killamsetty, Rishabh Iyer et al.

Continual learning (CL) aims to develop techniques by which a single model adapts to an increasing number of tasks encountered sequentially, thereby potentially leveraging learnings across tasks in a resource-efficient manner. A major challenge for CL systems is catastrophic forgetting, where earlier tasks are forgotten while learning a new task. To address this, replay-based CL approaches maintain and repeatedly retrain on a small buffer of data selected across encountered tasks. We propose Gradient Coreset Replay (GCR), a novel strategy for replay buffer selection and update using a carefully designed optimization criterion. Specifically, we select and maintain a "coreset" that closely approximates the gradient of all the data seen so far with respect to current model parameters, and discuss key strategies needed for its effective application to the continual learning setting. We show significant gains (2%-4% absolute) over the state-of-the-art in the well-studied offline continual learning setting. Our findings also effectively transfer to online / streaming CL settings, showing upto 5% gains over existing approaches. Finally, we demonstrate the value of supervised contrastive loss for continual learning, which yields a cumulative gain of up to 5% accuracy when combined with our subset selection strategy.

LGAug 19, 2021
Robust outlier detection by de-biasing VAE likelihoods

Kushal Chauhan, Barath Mohan U, Pradeep Shenoy et al.

Deep networks often make confident, yet, incorrect, predictions when tested with outlier data that is far removed from their training distributions. Likelihoods computed by deep generative models (DGMs) are a candidate metric for outlier detection with unlabeled data. Yet, previous studies have shown that DGM likelihoods are unreliable and can be easily biased by simple transformations to input data. Here, we examine outlier detection with variational autoencoders (VAEs), among the simplest of DGMs. We propose novel analytical and algorithmic approaches to ameliorate key biases with VAE likelihoods. Our bias corrections are sample-specific, computationally inexpensive, and readily computed for various decoder visible distributions. Next, we show that a well-known image pre-processing technique -- contrast stretching -- extends the effectiveness of bias correction to further improve outlier detection. Our approach achieves state-of-the-art accuracies with nine grayscale and natural image datasets, and demonstrates significant advantages -- both with speed and performance -- over four recent, competing approaches. In summary, lightweight remedies suffice to achieve robust outlier detection with VAEs.

AIJun 6, 2021
End-to-End Neuro-Symbolic Architecture for Image-to-Image Reasoning Tasks

Ananye Agarwal, Pradeep Shenoy, Mausam

Neural models and symbolic algorithms have recently been combined for tasks requiring both perception and reasoning. Neural models ground perceptual input into a conceptual vocabulary, on which a classical reasoning algorithm is applied to generate output. A key limitation is that such neural-to-symbolic models can only be trained end-to-end for tasks where the output space is symbolic. In this paper, we study neural-symbolic-neural models for reasoning tasks that require a conversion from an image input (e.g., a partially filled sudoku) to an image output (e.g., the image of the completed sudoku). While designing such a three-step hybrid architecture may be straightforward, the key technical challenge is end-to-end training -- how to backpropagate without intermediate supervision through the symbolic component. We propose NSNnet, an architecture that combines an image reconstruction loss with a novel output encoder to generate a supervisory signal, develops update algorithms that leverage policy gradient methods for supervision, and optimizes loss using a novel subsampling heuristic. We experiment on problem settings where symbolic algorithms are easily specified: a visual maze solving task and a visual Sudoku solver where the supervision is in image form. Experiments show high accuracy with significantly less data compared to purely neural approaches.

AIDec 9, 2020
Model-agnostic Fits for Understanding Information Seeking Patterns in Humans

Soumya Chatterjee, Pradeep Shenoy

In decision making tasks under uncertainty, humans display characteristic biases in seeking, integrating, and acting upon information relevant to the task. Here, we reexamine data from previous carefully designed experiments, collected at scale, that measured and catalogued these biases in aggregate form. We design deep learning models that replicate these biases in aggregate, while also capturing individual variation in behavior. A key finding of our work is that paucity of data collected from each individual subject can be overcome by sampling large numbers of subjects from the population, while still capturing individual differences. In addition, we can predict human behavior with high accuracy without making any assumptions about task goals, reward structure, or individual biases, thus providing a model-agnostic fit to human behavior in the task. Such an approach can sidestep potential limitations in modeler-specified inductive biases, and has implications for computational modeling of human cognitive function in general, and of human-AI interfaces in particular.