P. K. Srijith

LG
h-index16
21papers
1,028citations
Novelty53%
AI Score50

21 Papers

LGSep 16, 2022
Continual Learning with Dependency Preserving Hypernetworks

Dupati Srikar Chandra, Sakshi Varshney, P. K. Srijith et al.

Humans learn continually throughout their lifespan by accumulating diverse knowledge and fine-tuning it for future tasks. When presented with a similar goal, neural networks suffer from catastrophic forgetting if data distributions across sequential tasks are not stationary over the course of learning. An effective approach to address such continual learning (CL) problems is to use hypernetworks which generate task dependent weights for a target network. However, the continual learning performance of existing hypernetwork based approaches are affected by the assumption of independence of the weights across the layers in order to maintain parameter efficiency. To address this limitation, we propose a novel approach that uses a dependency preserving hypernetwork to generate weights for the target network while also maintaining the parameter efficiency. We propose to use recurrent neural network (RNN) based hypernetwork that can generate layer weights efficiently while allowing for dependencies across them. In addition, we propose novel regularisation and network growth techniques for the RNN based hypernetwork to further improve the continual learning performance. To demonstrate the effectiveness of the proposed methods, we conducted experiments on several image classification continual learning tasks and settings. We found that the proposed methods based on the RNN hypernetworks outperformed the baselines in all these CL settings and tasks.

CLOct 12, 2022
DialoGen: Generalized Long-Range Context Representation for Dialogue Systems

Suvodip Dey, Maunendra Sankar Desarkar, Asif Ekbal et al.

Long-range context modeling is crucial to both dialogue understanding and generation. The most popular method for dialogue context representation is to concatenate the last-$k$ utterances in chronological order. However, this method may not be ideal for conversations containing long-range dependencies, i.e., when there is a need to look beyond last-$k$ utterances to generate a meaningful response. In this work, we propose DialoGen, a novel encoder-decoder based framework for dialogue generation with a generalized context representation that can look beyond the last-$k$ utterances. The main idea of our approach is to identify and utilize the most relevant historical utterances instead of last-$k$, which also enables the compact representation of dialogue history with fewer tokens. We study the effectiveness of our proposed method on both dialogue generation (open-domain) and understanding (DST). Even with a compact context representation, DialoGen performs comparably to the state-of-the-art models on the open-domain DailyDialog dataset. We observe a similar behavior on the DST task of the MultiWOZ dataset when the proposed context representation is applied to existing DST models. We also discuss the generalizability and interpretability of DialoGen and show that the relevance score of previous utterances agrees well with human cognition.

LGOct 1, 2022
HyperHawkes: Hypernetwork based Neural Temporal Point Process

Manisha Dubey, P. K. Srijith, Maunendra Sankar Desarkar

Temporal point process serves as an essential tool for modeling time-to-event data in continuous time space. Despite having massive amounts of event sequence data from various domains like social media, healthcare etc., real world application of temporal point process faces two major challenges: 1) it is not generalizable to predict events from unseen sequences in dynamic environment 2) they are not capable of thriving in continually evolving environment with minimal supervision while retaining previously learnt knowledge. To tackle these issues, we propose \textit{HyperHawkes}, a hypernetwork based temporal point process framework which is capable of modeling time of occurrence of events for unseen sequences. Thereby, we solve the problem of zero-shot learning for time-to-event modeling. We also develop a hypernetwork based continually learning temporal point process for continuous modeling of time-to-event sequences with minimal forgetting. In this way, \textit{HyperHawkes} augments the temporal point process with zero-shot modeling and continual learning capabilities. We demonstrate the application of the proposed framework through our experiments on two real-world datasets. Our results show the efficacy of the proposed approach in terms of predicting future events under zero-shot regime for unseen event sequences. We also show that the proposed model is able to predict sequences continually while retaining information from previous event sequences, hence mitigating catastrophic forgetting for time-to-event data.

LGDec 28, 2022
Continuous Depth Recurrent Neural Differential Equations

Srinivas Anumasa, Geetakrishnasai Gunapati, P. K. Srijith

Recurrent neural networks (RNNs) have brought a lot of advancements in sequence labeling tasks and sequence data. However, their effectiveness is limited when the observations in the sequence are irregularly sampled, where the observations arrive at irregular time intervals. To address this, continuous time variants of the RNNs were introduced based on neural ordinary differential equations (NODE). They learn a better representation of the data using the continuous transformation of hidden states over time, taking into account the time interval between the observations. However, they are still limited in their capability as they use the discrete transformations and a fixed discrete number of layers (depth) over an input in the sequence to produce the output observation. We intend to address this limitation by proposing RNNs based on differential equations which model continuous transformations over both depth and time to predict an output for a given input in the sequence. Specifically, we propose continuous depth recurrent neural differential equations (CDR-NDE) which generalizes RNN models by continuously evolving the hidden states in both the temporal and depth dimensions. CDR-NDE considers two separate differential equations over each of these dimensions and models the evolution in the temporal and depth directions alternatively. We also propose the CDR-NDE-heat model based on partial differential equations which treats the computation of hidden states as solving a heat equation over time. We demonstrate the effectiveness of the proposed models by comparing against the state-of-the-art RNN models on real world sequence labeling problems and data.

29.1AIApr 30
Post-Optimization Adaptive Rank Allocation for LoRA

Vishnuprasadh Kumaravelu, Sunil Gupta, P. K. Srijith

Exponential growth in the scale of modern foundation models has led to the widespread adoption of Low-Rank Adaptation (LoRA) as a parameter-efficient fine-tuning technique. However, standard LoRA implementations disregard the varying intrinsic dimensionality of model layers and enforce a uniform rank, leading to parameter redundancy. We propose Post-Optimization Adaptive Rank Allocation (PARA), a data-free compression method for LoRA that integrates seamlessly into existing fine-tuning pipelines. PARA leverages Singular Value Decomposition to prune LoRA ranks using a global threshold over singular values across all layers. This results in non-uniform rank allocation based on layer-wise spectral importance. As a post-hoc method, PARA circumvents the training modifications and resulting instabilities that dynamic architectures typically incur. We empirically demonstrate that PARA reduces parameter count by 75-90\% while preserving the predictive performance of the original, uncompressed LoRA across multiple vision and language benchmarks. Code will be published upon acceptance.

CVMar 10, 2024
Transformer based Multitask Learning for Image Captioning and Object Detection

Debolena Basak, P. K. Srijith, Maunendra Sankar Desarkar

In several real-world scenarios like autonomous navigation and mobility, to obtain a better visual understanding of the surroundings, image captioning and object detection play a crucial role. This work introduces a novel multitask learning framework that combines image captioning and object detection into a joint model. We propose TICOD, Transformer-based Image Captioning and Object detection model for jointly training both tasks by combining the losses obtained from image captioning and object detection networks. By leveraging joint training, the model benefits from the complementary information shared between the two tasks, leading to improved performance for image captioning. Our approach utilizes a transformer-based architecture that enables end-to-end network integration for image captioning and object detection and performs both tasks jointly. We evaluate the effectiveness of our approach through comprehensive experiments on the MS-COCO dataset. Our model outperforms the baselines from image captioning literature by achieving a 3.65% improvement in BERTScore.

LGSep 22, 2025
An Unlearning Framework for Continual Learning

Sayanta Adhikari, Vishnuprasadh Kumaravelu, P. K. Srijith

Growing concerns surrounding AI safety and data privacy have driven the development of Machine Unlearning as a potential solution. However, current machine unlearning algorithms are designed to complement the offline training paradigm. The emergence of the Continual Learning (CL) paradigm promises incremental model updates, enabling models to learn new tasks sequentially. Naturally, some of those tasks may need to be unlearned to address safety or privacy concerns that might arise. We find that applying conventional unlearning algorithms in continual learning environments creates two critical problems: performance degradation on retained tasks and task relapse, where previously unlearned tasks resurface during subsequent learning. Furthermore, most unlearning algorithms require data to operate, which conflicts with CL's philosophy of discarding past data. A clear need arises for unlearning algorithms that are data-free and mindful of future learning. To that end, we propose UnCLe, an Unlearning framework for Continual Learning. UnCLe employs a hypernetwork that learns to generate task-specific network parameters, using task embeddings. Tasks are unlearned by aligning the corresponding generated network parameters with noise, without requiring any data. Empirical evaluations on several vision data sets demonstrate UnCLe's ability to sequentially perform multiple learning and unlearning operations with minimal disruption to previously acquired knowledge.

LGJul 24, 2025
Efficient Uncertainty in LLMs through Evidential Knowledge Distillation

Lakshmana Sri Harsha Nemani, P. K. Srijith, Tomasz Kuśmierczyk

Accurate uncertainty quantification remains a key challenge for standard LLMs, prompting the adoption of Bayesian and ensemble-based methods. However, such methods typically necessitate computationally expensive sampling, involving multiple forward passes to effectively estimate predictive uncertainty. In this paper, we introduce a novel approach enabling efficient and effective uncertainty estimation in LLMs without sacrificing performance. Specifically, we distill uncertainty-aware teacher models - originally requiring multiple forward passes - into compact student models sharing the same architecture but fine-tuned using Low-Rank Adaptation (LoRA). We compare two distinct distillation strategies: one in which the student employs traditional softmax-based outputs, and another in which the student leverages Dirichlet-distributed outputs to explicitly model epistemic uncertainty via evidential learning. Empirical evaluations on classification datasets demonstrate that such students can achieve comparable or superior predictive and uncertainty quantification performance relative to their teacher models, while critically requiring only a single forward pass. To our knowledge, this is the first demonstration that immediate and robust uncertainty quantification can be achieved in LLMs through evidential distillation.

LGDec 14, 2024
Linked Adapters: Linking Past and Future to Present for Effective Continual Learning

Dupati Srikar Chandra, P. K. Srijith, Dana Rezazadegan et al.

Continual learning allows the system to learn and adapt to new tasks while retaining the knowledge acquired from previous tasks. However, deep learning models suffer from catastrophic forgetting of knowledge learned from earlier tasks while learning a new task. Moreover, retraining large models like transformers from scratch for every new task is costly. An effective approach to address continual learning is to use a large pre-trained model with task-specific adapters to adapt to the new tasks. Though this approach can mitigate catastrophic forgetting, they fail to transfer knowledge across tasks as each task is learning adapters separately. To address this, we propose a novel approach Linked Adapters that allows knowledge transfer through a weighted attention mechanism to other task-specific adapters. Linked adapters use a multi-layer perceptron (MLP) to model the attention weights, which overcomes the challenge of backward knowledge transfer in continual learning in addition to modeling the forward knowledge transfer. During inference, our proposed approach effectively leverages knowledge transfer through MLP-based attention weights across all the lateral task adapters. Through numerous experiments conducted on diverse image classification datasets, we effectively demonstrated the improvement in performance on the continual learning tasks using Linked Adapters.

LGDec 29, 2021
Bayesian Neural Hawkes Process for Event Uncertainty Prediction

Manisha Dubey, Ragja Palakkadavath, P. K. Srijith

Event data consisting of time of occurrence of the events arises in several real-world applications. Recent works have introduced neural network based point processes for modeling event-times, and were shown to provide state-of-the-art performance in predicting event-times. However, neural point process models lack a good uncertainty quantification capability on predictions. A proper uncertainty quantification over event modeling will help in better decision making for many practical applications. Therefore, we propose a novel point process model, Bayesian Neural Hawkes process (BNHP) which leverages uncertainty modelling capability of Bayesian models and generalization capability of the neural networks to model event occurrence times. We augment the model with spatio-temporal modeling capability where it can consider uncertainty over predicted time and location of the events. Experiments on simulated and real-world datasets show that BNHP significantly improves prediction performance and uncertainty quantification for modelling events.

CLDec 23, 2021
Bi-Directional Recurrent Neural Ordinary Differential Equations for Social Media Text Classification

Maunika Tamire, Srinivas Anumasa, P. K. Srijith

Classification of posts in social media such as Twitter is difficult due to the noisy and short nature of texts. Sequence classification models based on recurrent neural networks (RNN) are popular for classifying posts that are sequential in nature. RNNs assume the hidden representation dynamics to evolve in a discrete manner and do not consider the exact time of the posting. In this work, we propose to use recurrent neural ordinary differential equations (RNODE) for social media post classification which consider the time of posting and allow the computation of hidden representation to evolve in a time-sensitive continuous manner. In addition, we propose a novel model, Bi-directional RNODE (Bi-RNODE), which can consider the information flow in both the forward and backward directions of posting times to predict the post label. Our experiments demonstrate that RNODE and Bi-RNODE are effective for the problem of stance classification of rumours in social media.

LGDec 23, 2021
Latent Time Neural Ordinary Differential Equations

Srinivas Anumasa, P. K. Srijith

Neural ordinary differential equations (NODE) have been proposed as a continuous depth generalization to popular deep learning models such as Residual networks (ResNets). They provide parameter efficiency and automate the model selection process in deep learning models to some extent. However, they lack the much-required uncertainty modelling and robustness capabilities which are crucial for their use in several real-world applications such as autonomous driving and healthcare. We propose a novel and unique approach to model uncertainty in NODE by considering a distribution over the end-time $T$ of the ODE solver. The proposed approach, latent time NODE (LT-NODE), treats $T$ as a latent variable and apply Bayesian learning to obtain a posterior distribution over $T$ from the data. In particular, we use variational inference to learn an approximate posterior and the model parameters. Prediction is done by considering the NODE representations from different samples of the posterior and can be done efficiently using a single forward pass. As $T$ implicitly defines the depth of a NODE, posterior distribution over $T$ would also help in model selection in NODE. We also propose, adaptive latent time NODE (ALT-NODE), which allow each data point to have a distinct posterior distribution over end-times. ALT-NODE uses amortized variational inference to learn an approximate posterior using inference networks. We demonstrate the effectiveness of the proposed approaches in modelling uncertainty and robustness through experiments on synthetic and several real-world image classification data.

LGDec 23, 2021
Improving Robustness and Uncertainty Modelling in Neural Ordinary Differential Equations

Srinivas Anumasa, P. K. Srijith

Neural ordinary differential equations (NODE) have been proposed as a continuous depth generalization to popular deep learning models such as Residual networks (ResNets). They provide parameter efficiency and automate the model selection process in deep learning models to some extent. However, they lack the much-required uncertainty modelling and robustness capabilities which are crucial for their use in several real-world applications such as autonomous driving and healthcare. We propose a novel and unique approach to model uncertainty in NODE by considering a distribution over the end-time $T$ of the ODE solver. The proposed approach, latent time NODE (LT-NODE), treats $T$ as a latent variable and apply Bayesian learning to obtain a posterior distribution over $T$ from the data. In particular, we use variational inference to learn an approximate posterior and the model parameters. Prediction is done by considering the NODE representations from different samples of the posterior and can be done efficiently using a single forward pass. As $T$ implicitly defines the depth of a NODE, posterior distribution over $T$ would also help in model selection in NODE. We also propose, adaptive latent time NODE (ALT-NODE), which allow each data point to have a distinct posterior distribution over end-times. ALT-NODE uses amortized variational inference to learn an approximate posterior using inference networks. We demonstrate the effectiveness of the proposed approaches in modelling uncertainty and robustness through experiments on synthetic and several real-world image classification data.

CVAug 8, 2021
Monte Carlo DropBlock for Modelling Uncertainty in Object Detection

Kumari Deepshikha, Sai Harsha Yelleni, P. K. Srijith et al.

With the advancements made in deep learning, computer vision problems like object detection and segmentation have seen a great improvement in performance. However, in many real-world applications such as autonomous driving vehicles, the risk associated with incorrect predictions of objects is very high. Standard deep learning models for object detection such as YOLO models are often overconfident in their predictions and do not take into account the uncertainty in predictions on out-of-distribution data. In this work, we propose an efficient and effective approach to model uncertainty in object detection and segmentation tasks using Monte-Carlo DropBlock (MC-DropBlock) based inference. The proposed approach applies drop-block during training time and test time on the convolutional layer of the deep learning models such as YOLO. We show that this leads to a Bayesian convolutional neural network capable of capturing the epistemic uncertainty in the model. Additionally, we capture the aleatoric uncertainty using a Gaussian likelihood. We demonstrate the effectiveness of the proposed approach on modeling uncertainty in object detection and segmentation tasks using out-of-distribution experiments. Experimental results show that MC-DropBlock improves the generalization, calibration, and uncertainty modeling capabilities of YOLO models in object detection and segmentation.

MLJul 17, 2021
Subset-of-Data Variational Inference for Deep Gaussian-Processes Regression

Ayush Jain, P. K. Srijith, Mohammad Emtiyaz Khan

Deep Gaussian Processes (DGPs) are multi-layer, flexible extensions of Gaussian processes but their training remains challenging. Sparse approximations simplify the training but often require optimization over a large number of inducing inputs and their locations across layers. In this paper, we simplify the training by setting the locations to a fixed subset of data and sampling the inducing inputs from a variational distribution. This reduces the trainable parameters and computation cost without significant performance degradations, as demonstrated by our empirical results on regression problems. Our modifications simplify and stabilize DGP training while making it amenable to sampling schemes for setting the inducing inputs.

LGJun 4, 2021
Adiabatic Quantum Feature Selection for Sparse Linear Regression

Surya Sai Teja Desu, P. K. Srijith, M. V. Panduranga Rao et al.

Linear regression is a popular machine learning approach to learn and predict real valued outputs or dependent variables from independent variables or features. In many real world problems, its beneficial to perform sparse linear regression to identify important features helpful in predicting the dependent variable. It not only helps in getting interpretable results but also avoids overfitting when the number of features is large, and the amount of data is small. The most natural way to achieve this is by using `best subset selection' which penalizes non-zero model parameters by adding $\ell_0$ norm over parameters to the least squares loss. However, this makes the objective function non-convex and intractable even for a small number of features. This paper aims to address the intractability of sparse linear regression with $\ell_0$ norm using adiabatic quantum computing, a quantum computing paradigm that is particularly useful for solving optimization problems faster. We formulate the $\ell_0$ optimization problem as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solve it using the D-Wave adiabatic quantum computer. We study and compare the quality of QUBO solution on synthetic and real world datasets. The results demonstrate the effectiveness of the proposed adiabatic quantum computing approach in finding the optimal solution. The QUBO solution matches the optimal solution for a wide range of sparsity penalty values across the datasets.

LGDec 12, 2020
Delay Differential Neural Networks

Srinivas Anumasa, P. K. Srijith

Neural ordinary differential equations (NODEs) treat computation of intermediate feature vectors as trajectories of ordinary differential equation parameterized by a neural network. In this paper, we propose a novel model, delay differential neural networks (DDNN), inspired by delay differential equations (DDEs). The proposed model considers the derivative of the hidden feature vector as a function of the current feature vector and past feature vectors (history). The function is modelled as a neural network and consequently, it leads to continuous depth alternatives to many recent ResNet variants. We propose two different DDNN architectures, depending on the way current and past feature vectors are considered. For training DDNNs, we provide a memory-efficient adjoint method for computing gradients and back-propagate through the network. DDNN improves the data efficiency of NODE by further reducing the number of parameters without affecting the generalization performance. Experiments conducted on synthetic and real-world image classification datasets such as Cifar10 and Cifar100 show the effectiveness of the proposed models.

CLNov 12, 2019
Learning Multi-Sense Word Distributions using Approximate Kullback-Leibler Divergence

P. Jayashree, Ballijepalli Shreya, P. K. Srijith

Learning word representations has garnered greater attention in the recent past due to its diverse text applications. Word embeddings encapsulate the syntactic and semantic regularities of sentences. Modelling word embedding as multi-sense gaussian mixture distributions, will additionally capture uncertainty and polysemy of words. We propose to learn the Gaussian mixture representation of words using a Kullback-Leibler (KL) divergence based objective function. The KL divergence based energy function provides a better distance metric which can effectively capture entailment and distribution similarity among the words. Due to the intractability of KL divergence for Gaussian mixture, we go for a KL approximation between Gaussian mixtures. We perform qualitative and quantitative experiments on benchmark word similarity and entailment datasets which demonstrate the effectiveness of the proposed approach.

MLJun 5, 2018
Deep Gaussian Processes with Convolutional Kernels

Vinayak Kumar, Vaibhav Singh, P. K. Srijith et al.

Deep Gaussian processes (DGPs) provide a Bayesian non-parametric alternative to standard parametric deep learning models. A DGP is formed by stacking multiple GPs resulting in a well-regularized composition of functions. The Bayesian framework that equips the model with attractive properties, such as implicit capacity control and predictive uncertainty, makes it at the same time challenging to combine with a convolutional structure. This has hindered the application of DGPs in computer vision tasks, an area where deep parametric models (i.e. CNNs) have made breakthroughs. Standard kernels used in DGPs such as radial basis functions (RBFs) are insufficient for handling pixel variability in raw images. In this paper, we build on the recent convolutional GP to develop Convolutional DGP (CDGP) models which effectively capture image level features through the use of convolution kernels, therefore opening up the way for applying DGPs to computer vision tasks. Our model learns local spatial influence and outperforms strong GP based baselines on multi-class image classification. We also consider various constructions of convolution kernel over the image patches, analyze the computational trade-offs and provide an efficient framework for convolutional DGP models. The experimental results on image data such as MNIST, rectangles-image, CIFAR10 and Caltech101 demonstrate the effectiveness of the proposed approaches.

IRJun 11, 2016
Sub-Story Detection in Twitter with Hierarchical Dirichlet Processes

P. K. Srijith, Mark Hepple, Kalina Bontcheva et al.

Social media has now become the de facto information source on real world events. The challenge, however, due to the high volume and velocity nature of social media streams, is in how to follow all posts pertaining to a given event over time, a task referred to as story detection. Moreover, there are often several different stories pertaining to a given event, which we refer to as sub-stories and the corresponding task of their automatic detection as sub-story detection. This paper proposes hierarchical Dirichlet processes (HDP), a probabilistic topic model, as an effective method for automatic sub-story detection. HDP can learn sub-topics associated with sub-stories which enables it to handle subtle variations in sub-stories. It is compared with state- of-the-art story detection approaches based on locality sensitive hashing and spectral clustering. We demonstrate the superior performance of HDP for sub-story detection on real world Twitter data sets using various evaluation measures. The ability of HDP to learn sub-topics helps it to recall the sub- stories with high precision. Another contribution of this paper is in demonstrating that the conversational structures within the Twitter stream can be used to improve sub-story detection performance significantly.

LGDec 25, 2014
Gaussian Process Pseudo-Likelihood Models for Sequence Labeling

P. K. Srijith, P. Balamurugan, Shirish Shevade

Several machine learning problems arising in natural language processing can be modeled as a sequence labeling problem. We provide Gaussian process models based on pseudo-likelihood approximation to perform sequence labeling. Gaussian processes (GPs) provide a Bayesian approach to learning in a kernel based framework. The pseudo-likelihood model enables one to capture long range dependencies among the output components of the sequence without becoming computationally intractable. We use an efficient variational Gaussian approximation method to perform inference in the proposed model. We also provide an iterative algorithm which can effectively make use of the information from the neighboring labels to perform prediction. The ability to capture long range dependencies makes the proposed approach useful for a wide range of sequence labeling problems. Numerical experiments on some sequence labeling data sets demonstrate the usefulness of the proposed approach.