LGJun 12, 2023
A Brief Review of Hypernetworks in Deep LearningVinod Kumar Chauhan, Jiandong Zhou, Ping Lu et al. · oxford
Hypernetworks, or hypernets for short, are neural networks that generate weights for another neural network, known as the target network. They have emerged as a powerful deep learning technique that allows for greater flexibility, adaptability, dynamism, faster training, information sharing, and model compression. Hypernets have shown promising results in a variety of deep learning problems, including continual learning, causal inference, transfer learning, weight pruning, uncertainty quantification, zero-shot learning, natural language processing, and reinforcement learning. Despite their success across different problem settings, there is currently no comprehensive review available to inform researchers about the latest developments and to assist in utilizing hypernets. To fill this gap, we review the progress in hypernets. We present an illustrative example of training deep neural networks using hypernets and propose categorizing hypernets based on five design criteria: inputs, outputs, variability of inputs and outputs, and the architecture of hypernets. We also review applications of hypernets across different deep learning problem settings, followed by a discussion of general scenarios where hypernets can be effectively employed. Finally, we discuss the challenges and future directions that remain underexplored in the field of hypernets. We believe that hypernetworks have the potential to revolutionize the field of deep learning. They offer a new way to design and train neural networks, and they have the potential to improve the performance of deep learning models on a variety of tasks. Through this review, we aim to inspire further advancements in deep learning through hypernetworks.
CLOct 17, 2022Code
Using Bottleneck Adapters to Identify Cancer in Clinical Notes under Low-Resource ConstraintsOmid Rohanian, Hannah Jauncey, Mohammadmahdi Nouriborji et al.
Processing information locked within clinical health records is a challenging task that remains an active area of research in biomedical NLP. In this work, we evaluate a broad set of machine learning techniques ranging from simple RNNs to specialised transformers such as BioBERT on a dataset containing clinical notes along with a set of annotations indicating whether a sample is cancer-related or not. Furthermore, we specifically employ efficient fine-tuning methods from NLP, namely, bottleneck adapters and prompt tuning, to adapt the models to our specialised task. Our evaluations suggest that fine-tuning a frozen BERT model pre-trained on natural language and with bottleneck adapters outperforms all other strategies, including full fine-tuning of the specialised BioBERT model. Based on our findings, we suggest that using bottleneck adapters in low-resource situations with limited access to labelled data or processing capacity could be a viable strategy in biomedical text mining. The code used in the experiments are going to be made available at https://github.com/omidrohanian/bottleneck-adapters.
LGOct 19, 2022
Adversarial De-confounding in Individualised Treatment Effects EstimationVinod Kumar Chauhan, Soheila Molaei, Marzia Hoque Tania et al. · oxford
Observational studies have recently received significant attention from the machine learning community due to the increasingly available non-experimental observational data and the limitations of the experimental studies, such as considerable cost, impracticality, small and less representative sample sizes, etc. In observational studies, de-confounding is a fundamental problem of individualised treatment effects (ITE) estimation. This paper proposes disentangled representations with adversarial training to selectively balance the confounders in the binary treatment setting for the ITE estimation. The adversarial training of treatment policy selectively encourages treatment-agnostic balanced representations for the confounders and helps to estimate the ITE in the observational studies via counterfactual inference. Empirical results on synthetic and real-world datasets, with varying degrees of confounding, prove that our proposed approach improves the state-of-the-art methods in achieving lower error in the ITE estimation.
LGAug 5, 2022
COPER: Continuous Patient State PerceiverVinod Kumar Chauhan, Anshul Thakur, Odhran O'Donoghue et al.
In electronic health records (EHRs), irregular time-series (ITS) occur naturally due to patient health dynamics, reflected by irregular hospital visits, diseases/conditions and the necessity to measure different vitals signs at each visit etc. ITS present challenges in training machine learning algorithms which mostly are built on assumption of coherent fixed dimensional feature space. In this paper, we propose a novel COntinuous patient state PERceiver model, called COPER, to cope with ITS in EHRs. COPER uses Perceiver model and the concept of neural ordinary differential equations (ODEs) to learn the continuous time dynamics of patient state, i.e., continuity of input space and continuity of output space. The neural ODEs help COPER to generate regular time-series to feed to Perceiver model which has the capability to handle multi-modality large-scale inputs. To evaluate the performance of the proposed model, we use in-hospital mortality prediction task on MIMIC-III dataset and carefully design experiments to study irregularity. The results are compared with the baselines which prove the efficacy of the proposed model.
LGFeb 28, 2023
Synthesizing Mixed-type Electronic Health Records using Diffusion ModelsTaha Ceritli, Ghadeer O. Ghosheh, Vinod Kumar Chauhan et al.
Electronic Health Records (EHRs) contain sensitive patient information, which presents privacy concerns when sharing such data. Synthetic data generation is a promising solution to mitigate these risks, often relying on deep generative models such as Generative Adversarial Networks (GANs). However, recent studies have shown that diffusion models offer several advantages over GANs, such as generation of more realistic synthetic data and stable training in generating data modalities, including image, text, and sound. In this work, we investigate the potential of diffusion models for generating realistic mixed-type tabular EHRs, comparing TabDDPM model with existing methods on four datasets in terms of data quality, utility, privacy, and augmentation. Our experiments demonstrate that TabDDPM outperforms the state-of-the-art models across all evaluation metrics, except for privacy, which confirms the trade-off between privacy and utility.
CVAug 15, 2021Code
HCR-Net: A deep learning based script independent handwritten character recognition networkVinod Kumar Chauhan, Sukhdeep Singh, Anuj Sharma
Handwritten character recognition (HCR) remains a challenging pattern recognition problem despite decades of research, and lacks research on script independent recognition techniques. {\color{black}This is mainly because of similar character structures, different handwriting styles, diverse scripts, handcrafted feature extraction techniques, unavailability of data and code, and the development of script-specific deep learning techniques. To address these limitations, we have proposed a script independent deep learning network for HCR research, called HCR-Net, that sets a new research direction for the field. HCR-Net is based on a novel transfer learning approach for HCR, which \textit{partly utilizes} feature extraction layers of a pre-trained network.} Due to transfer learning and image augmentation, HCR-Net provides faster and computationally efficient training, better performance and generalizations, and can work with small datasets. HCR-Net is extensively evaluated on 40 publicly available datasets of Bangla, Punjabi, Hindi, English, Swedish, Urdu, Farsi, Tibetan, Kannada, Malayalam, Telugu, Marathi, Nepali and Arabic languages, and established 26 new benchmark results while performed close to the best results in the rest cases. HCR-Net showed performance improvements up to 11\% against the existing results and achieved a fast convergence rate showing up to 99\% of final performance in the very first epoch. HCR-Net significantly outperformed the state-of-the-art transfer learning techniques and also reduced the number of trainable parameters by 34\% as compared with the corresponding pre-trained network. To facilitate reproducibility and further advancements of HCR research, the complete code is publicly released at \url{https://github.com/jmdvinodjmd/HCR-Net}.
LGApr 20, 2019Code
LIBS2ML: A Library for Scalable Second Order Machine Learning AlgorithmsVinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya
LIBS2ML is a library based on scalable second order learning algorithms for solving large-scale problems, i.e., big data problems in machine learning. LIBS2ML has been developed using MEX files, i.e., C++ with MATLAB/Octave interface to take the advantage of both the worlds, i.e., faster learning using C++ and easy I/O using MATLAB. Most of the available libraries are either in MATLAB/Python/R which are very slow and not suitable for large-scale learning, or are in C/C++ which does not have easy ways to take input and display results. So LIBS2ML is completely unique due to its focus on the scalable second order methods, the hot research topic, and being based on MEX files. Thus it provides researchers a comprehensive environment to evaluate their ideas and it also provides machine learning practitioners an effective tool to deal with the large-scale learning problems. LIBS2ML is an open-source, highly efficient, extensible, scalable, readable, portable and easy to use library. The library can be downloaded from the URL: \url{https://github.com/jmdvinodjmd/LIBS2ML}.
LGMay 13, 2024
Sample Selection Bias in Machine Learning for HealthcareVinod Kumar Chauhan, Lei Clifton, Achille Salaün et al. · oxford
While machine learning algorithms hold promise for personalised medicine, their clinical adoption remains limited, partly due to biases that can compromise the reliability of predictions. In this paper, we focus on sample selection bias (SSB), a specific type of bias where the study population is less representative of the target population, leading to biased and potentially harmful decisions. Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare. Moreover, the existing machine learning techniques try to correct the bias mostly by balancing distributions between the study and the target populations, which may result in a loss of predictive performance. To address these problems, our study illustrates the potential risks associated with SSB by examining SSB's impact on the performance of machine learning algorithms. Most importantly, we propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction. Specifically, we propose two independent networks(T-Net) and a multitasking network (MT-Net) for addressing SSB, where one network/task identifies the target subpopulation which is representative of the study population and the second makes predictions for the identified subpopulation. Our empirical results with synthetic and semi-synthetic datasets highlight that SSB can lead to a large drop in the performance of an algorithm for the target population as compared with the study population, as well as a substantial difference in the performance for the target subpopulations that are representative of the selected and the non-selected patients from the study population. Furthermore, our proposed techniques demonstrate robustness across various settings, including different dataset sizes, event rates, and selection rates, outperforming the existing bias correction techniques.
LGMar 22, 2024
GTAGCN: Generalized Topology Adaptive Graph Convolutional NetworksSukhdeep Singh, Anuj Sharma, Vinod Kumar Chauhan · oxford
Graph Neural Networks (GNN) have emerged as a popular and standard approach for learning from graph-structured data. The literature on GNN highlights the potential of this evolving research area and its widespread adoption in real-life applications. However, most of the approaches are either new in concept or derived from specific techniques. Therefore, the potential of more than one approach in hybrid form has not been studied extensively, which can be well utilized for sequenced data or static data together. We derive a hybrid approach based on two established techniques as generalized aggregation networks and topology adaptive graph convolution networks that solve our purpose to apply on both types of sequenced and static nature of data, effectively. The proposed method applies to both node and graph classification. Our empirical analysis reveals that the results are at par with literature results and better for handwritten strokes as sequenced data, where graph structures have not been explored.
LGFeb 12, 2025
Individualised Treatment Effects Estimation with Composite Treatments and Composite OutcomesVinod Kumar Chauhan, Lei Clifton, Gaurav Nigam et al. · oxford
Estimating individualised treatment effect (ITE) -- that is the causal effect of a set of variables (also called exposures, treatments, actions, policies, or interventions), referred to as \textit{composite treatments}, on a set of outcome variables of interest, referred to as \textit{composite outcomes}, for a unit from observational data -- remains a fundamental problem in causal inference with applications across disciplines, such as healthcare, economics, education, social science, marketing, and computer science. Previous work in causal machine learning for ITE estimation is limited to simple settings, like single treatments and single outcomes. This hinders their use in complex real-world scenarios; for example, consider studying the effect of different ICU interventions, such as beta-blockers and statins for a patient admitted for heart surgery, on different outcomes of interest such as atrial fibrillation and in-hospital mortality. The limited research into composite treatments and outcomes is primarily due to data scarcity for all treatments and outcomes. To address the above challenges, we propose a novel and innovative hypernetwork-based approach, called \emph{H-Learner}, to solve ITE estimation under composite treatments and composite outcomes, which tackles the data scarcity issue by dynamically sharing information across treatments and outcomes. Our empirical analysis with binary and arbitrary composite treatments and outcomes demonstrates the effectiveness of the proposed approach compared to existing methods.
LGMay 25, 2023
Dynamic Inter-treatment Information Sharing for Individualized Treatment Effects EstimationVinod Kumar Chauhan, Jiandong Zhou, Ghadeer Ghosheh et al.
Estimation of individualized treatment effects (ITE) from observational studies is a fundamental problem in causal inference and holds significant importance across domains, including healthcare. However, limited observational datasets pose challenges in reliable ITE estimation as data have to be split among treatment groups to train an ITE learner. While information sharing among treatment groups can partially alleviate the problem, there is currently no general framework for end-to-end information sharing in ITE estimation. To tackle this problem, we propose a deep learning framework based on `\textit{soft weight sharing}' to train ITE learners, enabling \textit{dynamic end-to-end} information sharing among treatment groups. The proposed framework complements existing ITE learners, and introduces a new class of ITE learners, referred to as \textit{HyperITE}. We extend state-of-the-art ITE learners with \textit{HyperITE} versions and evaluate them on IHDP, ACIC-2016, and Twins benchmarks. Our experimental results show that the proposed framework improves ITE estimation error, with increasing effectiveness for smaller datasets.
LGDec 26, 2018
Stochastic Trust Region Inexact Newton Method for Large-scale Machine LearningVinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya
Nowadays stochastic approximation methods are one of the major research direction to deal with the large-scale machine learning problems. From stochastic first order methods, now the focus is shifting to stochastic second order methods due to their faster convergence and availability of computing resources. In this paper, we have proposed a novel Stochastic Trust RegiOn Inexact Newton method, called as STRON, to solve large-scale learning problems which uses conjugate gradient (CG) to inexactly solve trust region subproblem. The method uses progressive subsampling in the calculation of gradient and Hessian values to take the advantage of both, stochastic and full-batch regimes. We have extended STRON using existing variance reduction techniques to deal with the noisy gradients and using preconditioned conjugate gradient (PCG) as subproblem solver, and empirically proved that they do not work as expected, for the large-scale learning problems. Finally, our empirical results prove efficacy of the proposed method against existing methods with bench marked datasets.
LGJul 24, 2018
SAAGs: Biased Stochastic Variance Reduction Methods for Large-scale LearningVinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya
Stochastic approximation is one of the effective approach to deal with the large-scale machine learning problems and the recent research has focused on reduction of variance, caused by the noisy approximations of the gradients. In this paper, we have proposed novel variants of SAAG-I and II (Stochastic Average Adjusted Gradient) (Chauhan et al. 2017), called SAAG-III and IV, respectively. Unlike SAAG-I, starting point is set to average of previous epoch in SAAG-III, and unlike SAAG-II, the snap point and starting point are set to average and last iterate of previous epoch in SAAG-IV, respectively. To determine the step size, we have used Stochastic Backtracking-Armijo line Search (SBAS) which performs line search only on selected mini-batch of data points. Since backtracking line search is not suitable for large-scale problems and the constants used to find the step size, like Lipschitz constant, are not always available so SBAS could be very effective in such cases. We have extended SAAGs (I, II, III and IV) to solve non-smooth problems and designed two update rules for smooth and non-smooth problems. Moreover, our theoretical results have proved linear convergence of SAAG-IV for all the four combinations of smoothness and strong-convexity, in expectation. Finally, our experimental studies have proved the efficacy of proposed methods against the state-of-art techniques.
LGJan 18, 2018
Faster Learning by Reduction of Data Access TimeVinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya
Nowadays, the major challenge in machine learning is the Big Data challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process (learn from) the data. So far, the research has focused only on the second part, i.e., learning from the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used Empirical Risk Minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove the same convergence for systematic sampling, cyclic sampling and the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques and show up to six times faster training.