OCMay 13
Stochastic global optimization of continuous functions via random walks on GrassmanniansKartik Gupta, Stephen D. Miller, Pradeep Ravikumar et al.
We introduce a stochastic global optimization method based on random walks on Grassmannian manifolds. To minimize a continuous objective $\ell:\mathbb{R}^d\rightarrow\mathbb{R}$, the method repeatedly samples random $k$-dimensional linear subspaces (with $k\ll d$), solves the resulting low-dimensional restrictions of these problems to these subspaces using an arbitrary black-box optimizer, and updates the iterate (which monotonically improves upon the previous iterate). Unlike classical optimization analyses that rely on convexity, smoothness, Lipschitz bounds, or Polyak-Lojasiewicz-type conditions, our convergence guarantees depend only on the geometric distribution of restricted minima across the $k$-dimensional subspaces passing through a given point in $\mathbb{R}^d$. We identify a gap parameter -- an analogue of a spectral gap for random walks -- that controls the rate at which the iterates approach the global minimum value. Finally, we argue that the same analysis yields a blind-spot robustness property: sufficiently narrow, deep dips of the loss function (small-measure regions where $\ell$ spikes downward) have limited influence on the algorithm's trajectory, since they are unlikely to be encountered by random subspace sampling.
LGDec 22, 2022
Understanding and Improving the Role of Projection Head in Self-Supervised LearningKartik Gupta, Thalaiyasingam Ajanthan, Anton van den Hengel et al.
Self-supervised learning (SSL) aims to produce useful feature representations without access to any human-labeled data annotations. Due to the success of recent SSL methods based on contrastive learning, such as SimCLR, this problem has gained popularity. Most current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective and then discard the learned projection head after training. This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training? In this work, we first perform a systematic study on the behavior of SSL training focusing on the role of the projection head layers. By formulating the projection head as a parametric component for the InfoNCE objective rather than a part of the network, we present an alternative optimization scheme for training contrastive learning based SSL frameworks. Our experimental study on multiple image classification datasets demonstrates the effectiveness of the proposed approach over alternatives in the SSL literature.
LGJun 22, 2022
Quantization Robust Federated Learning for Efficient Inference on Heterogeneous DevicesKartik Gupta, Marios Fournarakis, Matthias Reisser et al.
Federated Learning (FL) is a machine learning paradigm to distributively learn machine learning models from decentralized data that remains on-device. Despite the success of standard Federated optimization methods, such as Federated Averaging (FedAvg) in FL, the energy demands and hardware induced constraints for on-device learning have not been considered sufficiently in the literature. Specifically, an essential demand for on-device learning is to enable trained models to be quantized to various bit-widths based on the energy needs and heterogeneous hardware designs across the federation. In this work, we introduce multiple variants of federated averaging algorithm that train neural networks robust to quantization. Such networks can be quantized to various bit-widths with only limited reduction in full precision model accuracy. We perform extensive experiments on standard FL benchmarks to evaluate our proposed FedAvg variants for quantization robustness and provide a convergence analysis for our Quantization-Aware variants in FL. Our results demonstrate that integrating quantization robustness results in FL models that are significantly more robust to different bit-widths during quantized on-device inference.
CVNov 9, 2023
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO NetworksKartik Gupta, Akshay Asthana
Quantized networks use less computational and memory resources and are suitable for deployment on edge devices. While quantization-aware training QAT is the well-studied approach to quantize the networks at low precision, most research focuses on over-parameterized networks for classification with limited studies on popular and edge device friendly single-shot object detection and semantic segmentation methods like YOLO. Moreover, majority of QAT methods rely on Straight-through Estimator (STE) approximation which suffers from an oscillation phenomenon resulting in sub-optimal network quantization. In this paper, we show that it is difficult to achieve extremely low precision (4-bit and lower) for efficient YOLO models even with SOTA QAT methods due to oscillation issue and existing methods to overcome this problem are not effective on these models. To mitigate the effect of oscillation, we first propose Exponentially Moving Average (EMA) based update to the QAT model. Further, we propose a simple QAT correction method, namely QC, that takes only a single epoch of training after standard QAT procedure to correct the error induced by oscillating weights and activations resulting in a more accurate quantized model. With extensive evaluation on COCO dataset using various YOLO5 and YOLO7 variants, we show that our correction method improves quantized YOLO networks consistently on both object detection and segmentation tasks at low-precision (4-bit and 3-bit).
CLFeb 22, 2025Code
Fine-Tuning Qwen 2.5 3B for Realistic Movie Dialogue GenerationKartik Gupta
The Qwen 2.5 3B base model was fine-tuned to generate contextually rich and engaging movie dialogue, leveraging the Cornell Movie-Dialog Corpus, a curated dataset of movie conversations. Due to the limitations in GPU computing and VRAM, the training process began with the 0.5B model progressively scaling up to the 1.5B and 3B versions as efficiency improvements were implemented. The Qwen 2.5 series, developed by Alibaba Group, stands at the forefront of small open-source pre-trained models, particularly excelling in creative tasks compared to alternatives like Meta's Llama 3.2 and Google's Gemma. Results demonstrate the ability of small models to produce high-quality, realistic dialogue, offering a promising approach for real-time, context-sensitive conversation generation.
LGJun 23, 2020Code
Calibration of Neural Networks using SplinesKartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan et al.
Calibrating neural networks is of utmost importance when employing them in safety-critical applications where the downstream decision making depends on the predicted probabilities. Measuring calibration error amounts to comparing two empirical distributions. In this work, we introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test in which the main idea is to compare the respective cumulative probability distributions. From this, by approximating the empirical cumulative distribution using a differentiable function via splines, we obtain a recalibration function, which maps the network outputs to actual (calibrated) class assignment probabilities. The spine-fitting is performed using a held-out calibration set and the obtained recalibration function is evaluated on an unseen test set. We tested our method against existing calibration approaches on various image classification datasets and our spline-based recalibration approach consistently outperforms existing methods on KS error as well as other commonly used calibration measures. Our Code is available at https://github.com/kartikgupta-at-anu/spline-calibration.
CVMar 30, 2020Code
Improved Gradient based Adversarial Attacks for Quantized NetworksKartik Gupta, Thalaiyasingam Ajanthan
Neural network quantization has become increasingly popular due to efficient memory consumption and faster computation resulting from bitwise operations on the quantized networks. Even though they exhibit excellent generalization capabilities, their robustness properties are not well-understood. In this work, we systematically study the robustness of quantized networks against gradient based adversarial attacks and demonstrate that these quantized models suffer from gradient vanishing issues and show a fake sense of robustness. By attributing gradient vanishing to poor forward-backward signal propagation in the trained network, we introduce a simple temperature scaling approach to mitigate this issue while preserving the decision boundary. Despite being a simple modification to existing gradient based adversarial attacks, experiments on multiple image classification datasets with multiple network architectures demonstrate that our temperature scaled attacks obtain near-perfect success rate on quantized networks while outperforming original attacks on adversarially trained models as well as floating-point networks. Code is available at https://github.com/kartikgupta-at-anu/attack-bnn.
LGOct 18, 2019Code
Mirror Descent View for Neural Network QuantizationThalaiyasingam Ajanthan, Kartik Gupta, Philip H. S. Torr et al.
Quantizing large Neural Networks (NN) while maintaining the performance is highly desirable for resource-limited devices due to reduced memory and time complexity. It is usually formulated as a constrained optimization problem and optimized via a modified version of gradient descent. In this work, by interpreting the continuous parameters (unconstrained) as the dual of the quantized ones, we introduce a Mirror Descent (MD) framework for NN quantization. Specifically, we provide conditions on the projections (i.e., mapping from continuous to quantized ones) which would enable us to derive valid mirror maps and in turn the respective MD updates. Furthermore, we present a numerically stable implementation of MD that requires storing an additional set of auxiliary variables (unconstrained), and show that it is strikingly analogous to the Straight Through Estimator (STE) based method which is typically viewed as a "trick" to avoid vanishing gradients issue. Our experiments on CIFAR-10/100, TinyImageNet, and ImageNet classification datasets with VGG-16, ResNet-18, and MobileNetV2 architectures show that our MD variants obtain quantized networks with state-of-the-art performance. Code is available at https://github.com/kartikgupta-at-anu/md-bnn.
CVSep 30, 2019Code
CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose EstimationKartik Gupta, Lars Petersson, Richard Hartley
We present a new approach for a single view, image-based object pose estimation. Specifically, the problem of culling false positives among several pose proposal estimates is addressed in this paper. Our proposed approach targets the problem of inaccurate confidence values predicted by CNNs which is used by many current methods to choose a final object pose prediction. We present a network called CullNet, solving this task. CullNet takes pairs of pose masks rendered from a 3D model and cropped regions in the original image as input. This is then used to calibrate the confidence scores of the pose proposals. This new set of confidence scores is found to be significantly more reliable for accurate object pose estimation as shown by our results. Our experimental results on multiple challenging datasets (LINEMOD and Occlusion LINEMOD) reflects the utility of our proposed method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on these standard object pose estimation datasets. Our code is publicly available on https://github.com/kartikgupta-at-anu/CullNet.
CVNov 10, 2022
DrawMon: A Distributed System for Detection of Atypical Sketch Content in Concurrent Pictionary GamesNikhil Bansal, Kartik Gupta, Kiruthika Kannan et al.
Pictionary, the popular sketch-based guessing game, provides an opportunity to analyze shared goal cooperative game play in restricted communication settings. However, some players occasionally draw atypical sketch content. While such content is occasionally relevant in the game context, it sometimes represents a rule violation and impairs the game experience. To address such situations in a timely and scalable manner, we introduce DrawMon, a novel distributed framework for automatic detection of atypical sketch content in concurrently occurring Pictionary game sessions. We build specialized online interfaces to collect game session data and annotate atypical sketch content, resulting in AtyPict, the first ever atypical sketch content dataset. We use AtyPict to train CanvasNet, a deep neural atypical content detection network. We utilize CanvasNet as a core component of DrawMon. Our analysis of post deployment game session data indicates DrawMon's effectiveness for scalable monitoring and atypical sketch content detection. Beyond Pictionary, our contributions also serve as a design guide for customized atypical content response systems involving shared and interactive whiteboards. Code and datasets are available at https://drawm0n.github.io.
CVFeb 1, 2024
Towards Optimal Feature-Shaping Methods for Out-of-Distribution DetectionQinyu Zhao, Ming Xu, Kartik Gupta et al.
Feature shaping refers to a family of methods that exhibit state-of-the-art performance for out-of-distribution (OOD) detection. These approaches manipulate the feature representation, typically from the penultimate layer of a pre-trained deep learning model, so as to better differentiate between in-distribution (ID) and OOD samples. However, existing feature-shaping methods usually employ rules manually designed for specific model architectures and OOD datasets, which consequently limit their generalization ability. To address this gap, we first formulate an abstract optimization framework for studying feature-shaping methods. We then propose a concrete reduction of the framework with a simple piecewise constant shaping function and show that existing feature-shaping methods approximate the optimal solution to the concrete optimization problem. Further, assuming that OOD data is inaccessible, we propose a formulation that yields a closed-form solution for the piecewise constant shaping function, utilizing solely the ID data. Through extensive experiments, we show that the feature-shaping function optimized by our method improves the generalization ability of OOD detection across a large variety of datasets and model architectures.
IVSep 27, 2024
Learning-Based Image Compression for MachinesKartik Gupta, Kimberley Faria, Vikas Mehta
While learning based compression techniques for images have outperformed traditional methods, they have not been widely adopted in machine learning pipelines. This is largely due to lack of standardization and lack of retention of salient features needed for such tasks. Decompression of images have taken a back seat in recent years while the focus has shifted to an image's utility in performing machine learning based analysis on top of them. Thus the demand for compression pipelines that incorporate such features from images has become ever present. The methods outlined in the report build on the recent work done on learning based image compression techniques to incorporate downstream tasks in them. We propose various methods of finetuning and enhancing different parts of pretrained compression encoding pipeline and present the results of our investigation regarding the performance of vision tasks using compression based pipelines.
LGNov 25, 2024
Adaptive Circuit Behavior and Generalization in Mechanistic InterpretabilityJatin Nainani, Sankaran Vaidyanathan, AJ Yeung et al.
Mechanistic interpretability aims to understand the inner workings of large neural networks by identifying circuits, or minimal subgraphs within the model that implement algorithms responsible for performing specific tasks. These circuits are typically discovered and analyzed using a narrowly defined prompt format. However, given the abilities of large language models (LLMs) to generalize across various prompt formats for the same task, it remains unclear how well these circuits generalize. For instance, it is unclear whether the models generalization results from reusing the same circuit components, the components behaving differently, or the use of entirely different components. In this paper, we investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small, which is well-studied and believed to implement a simple, interpretable algorithm. We evaluate its performance on prompt variants that challenge the assumptions of this algorithm. Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges. Notably, the circuit generalizes even to prompt variants where the original algorithm should fail; we discover a mechanism that explains this which we term S2 Hacking. Our findings indicate that circuits within LLMs may be more flexible and general than previously recognized, underscoring the importance of studying circuit generalization to better understand the broader capabilities of these models.
CVOct 14, 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?Qinyu Zhao, Ming Xu, Kartik Gupta et al.
Evaluating large vision-language models (LVLMs) is very expensive, due to high computational cost and the wide variety of tasks. The good news is that if we already have some observed performance scores, we may be able to infer unknown ones. In this study, we propose a new framework for predicting unknown performance scores based on observed ones from other LVLMs or tasks. We first formulate the performance prediction as a matrix completion task. Specifically, we construct a sparse performance matrix $\boldsymbol{R}$, where each entry $R_{mn}$ represents the performance score of the $m$-th model on the $n$-th dataset. By applying probabilistic matrix factorization (PMF) with Markov chain Monte Carlo (MCMC), we can complete the performance matrix, i.e., predict unknown scores. Additionally, we estimate the uncertainty of performance prediction based on MCMC. Practitioners can evaluate their models on untested tasks with higher uncertainty first, which quickly reduces the prediction errors. We further introduce several improvements to enhance PMF for scenarios with sparse observed performance scores. Our experiments demonstrate the accuracy of PMF in predicting unknown scores, the reliability of uncertainty estimates in ordering evaluations, and the effectiveness of our enhancements for handling sparse data.
CLFeb 22, 2025
Iterative Auto-Annotation for Scientific Named Entity Recognition Using BERT-Based ModelsKartik Gupta
This paper presents an iterative approach to performing Scientific Named Entity Recognition (SciNER) using BERT-based models. We leverage transfer learning to fine-tune pretrained models with a small but high-quality set of manually annotated data. The process is iteratively refined by using the fine-tuned model to auto-annotate a larger dataset, followed by additional rounds of fine-tuning. We evaluated two models, dslim/bert-large-NER and bert-largecased, and found that bert-large-cased consistently outperformed the former. Our approach demonstrated significant improvements in prediction accuracy and F1 scores, especially for less common entity classes. Future work could include pertaining with unlabeled data, exploring more powerful encoders like RoBERTa, and expanding the scope of manual annotations. This methodology has broader applications in NLP tasks where access to labeled data is limited.
CVApr 1, 2024
Transfer Learning with Point TransformersKartik Gupta, Rahul Vippala, Sahima Srivastava
Point Transformers are near state-of-the-art models for classification, segmentation, and detection tasks on Point Cloud data. They utilize a self attention based mechanism to model large range spatial dependencies between multiple point sets. In this project we explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model to classify 3D MNIST dataset after finetuning. We also train the model from scratch on 3D MNIST dataset to compare the performance of finetuned and from-scratch model on the MNIST dataset. We observe that since the two datasets have a large difference in the degree of the distributions, transfer learned models do not outperform the from-scratch models in this case. Although we do expect transfer learned models to converge faster since they already know the lower level edges, corners, etc features from the ModelNet10 dataset.
CVMar 14, 2024
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?Qinyu Zhao, Ming Xu, Kartik Gupta et al.
Large vision-language models (LVLMs), designed to interpret and respond to human instructions, occasionally generate hallucinated or harmful content due to inappropriate instructions. This study uses linear probing to shed light on the hidden knowledge at the output layers of LVLMs. We demonstrate that the logit distributions of the first tokens contain sufficient information to determine whether to respond to the instructions, including recognizing unanswerable visual questions, defending against jailbreaking attacks, and identifying deceptive questions. Such hidden knowledge is gradually lost in logits of subsequent tokens during response generation. Then, we illustrate a simple decoding strategy at the generation of the first token, effectively improving the generated content. In experiments, we find a few interesting insights: First, the CLIP model already contains a strong signal for solving these tasks, which indicates potential bias in the existing datasets. Second, we observe performance improvement by utilizing the first logit distributions on three additional tasks, including indicating uncertainty in math solving, mitigating hallucination, and image classification. Last, with the same training data, simply finetuning LVLMs improves models' performance but is still inferior to linear probing on these tasks.
LGJun 23, 2020
Post-hoc Calibration of Neural Networks by g-LayersAmir Rahimi, Thomas Mensink, Kartik Gupta et al.
Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves. In recent years, there is a surge of research on neural network calibration and the majority of the works can be categorized into post-hoc calibration methods, defined as methods that learn an additional function to calibrate an already trained base network. In this work, we intend to understand the post-hoc calibration methods from a theoretical point of view. Especially, it is known that minimizing Negative Log-Likelihood (NLL) will lead to a calibrated network on the training set if the global optimum is attained (Bishop, 1994). Nevertheless, it is not clear learning an additional function in a post-hoc manner would lead to calibration in the theoretical sense. To this end, we prove that even though the base network ($f$) does not lead to the global optimum of NLL, by adding additional layers ($g$) and minimizing NLL by optimizing the parameters of $g$ one can obtain a calibrated network $g \circ f$. This not only provides a less stringent condition to obtain a calibrated network but also provides a theoretical justification of post-hoc calibration methods. Our experiments on various image classification benchmarks confirm the theory.
MLJun 19, 2020
Learning Minimax Estimators via Online LearningKartik Gupta, Arun Sai Suggala, Adarsh Prasad et al.
We consider the problem of designing minimax estimators for estimating the parameters of a probability distribution. Unlike classical approaches such as the MLE and minimum distance estimators, we consider an algorithmic approach for constructing such estimators. We view the problem of designing minimax estimators as finding a mixed strategy Nash equilibrium of a zero-sum game. By leveraging recent results in online learning with non-convex losses, we provide a general algorithm for finding a mixed-strategy Nash equilibrium of general non-convex non-concave zero-sum games. Our algorithm requires access to two subroutines: (a) one which outputs a Bayes estimator corresponding to a given prior probability distribution, and (b) one which computes the worst-case risk of any given estimator. Given access to these two subroutines, we show that our algorithm outputs both a minimax estimator and a least favorable prior. To demonstrate the power of this approach, we use it to construct provably minimax estimators for classical problems such as estimation in the finite Gaussian sequence model, and linear regression.
IRNov 20, 2018
Attentive Neural Architecture Incorporating Song Features For Music RecommendationNoveen Sachdeva, Kartik Gupta, Vikram Pudi
Recommender Systems are an integral part of music sharing platforms. Often the aim of these systems is to increase the time, the user spends on the platform and hence having a high commercial value. The systems which aim at increasing the average time a user spends on the platform often need to recommend songs which the user might want to listen to next at each point in time. This is different from recommendation systems which try to predict the item which might be of interest to the user at some point in the user lifetime but not necessarily in the very near future. Prediction of the next song the user might like requires some kind of modeling of the user interests at the given point of time. Attentive neural networks have been exploiting the sequence in which the items were selected by the user to model the implicit short-term interests of the user for the task of next item prediction, however we feel that the features of the songs occurring in the sequence could also convey some important information about the short-term user interest which only the items cannot. In this direction, we propose a novel attentive neural architecture which in addition to the sequence of items selected by the user, uses the features of these items to better learn the user short-term preferences and recommend the next song to the user.
CVJun 20, 2018
Classifying Object Manipulation Actions based on Grasp-types and Motion-ConstraintsKartik Gupta, Darius Burschka, Arnav Bhavsar
In this work, we address a challenging problem of fine-grained and coarse-grained recognition of object manipulation actions. Due to the variations in geometrical and motion constraints, there are different manipulations actions possible to perform different sets of actions with an object. Also, there are subtle movements involved to complete most of object manipulation actions. This makes the task of object manipulation action recognition difficult with only just the motion information. We propose to use grasp and motion-constraints information to recognise and understand action intention with different objects. We also provide an extensive experimental evaluation on the recent Yale Human Grasping dataset consisting of large set of 455 manipulation actions. The evaluation involves a) Different contemporary multi-class classifiers, and binary classifiers with one-vs-one multi- class voting scheme, b) Differential comparisons results based on subsets of attributes involving information of grasp and motion-constraints, c) Fine-grained and Coarse-grained object manipulation action recognition based on fine-grained as well as coarse-grained grasp type information, and d) Comparison between Instance level and Sequence level modeling of object manipulation actions. Our results justifies the efficacy of grasp attributes for the task of fine-grained and coarse-grained object manipulation action recognition.
LGJun 23, 2016
Nearly-optimal Robust Matrix CompletionYeshwanth Cherapanamjeri, Kartik Gupta, Prateek Jain
In this paper, we consider the problem of Robust Matrix Completion (RMC) where the goal is to recover a low-rank matrix by observing a small number of its entries out of which a few can be arbitrarily corrupted. We propose a simple projected gradient descent method to estimate the low-rank matrix that alternately performs a projected gradient descent step and cleans up a few of the corrupted entries using hard-thresholding. Our algorithm solves RMC using nearly optimal number of observations as well as nearly optimal number of corruptions. Our result also implies significant improvement over the existing time complexity bounds for the low-rank matrix completion problem. Finally, an application of our result to the robust PCA problem (low-rank+sparse matrix separation) leads to nearly linear time (in matrix dimensions) algorithm for the same; existing state-of-the-art methods require quadratic time. Our empirical results corroborate our theoretical results and show that even for moderate sized problems, our method for robust PCA is an an order of magnitude faster than the existing methods.