LGSep 26, 2022
Learning to Drop Out: An Adversarial Approach to Training Sequence VAEsĐorđe Miladinović, Kumar Shridhar, Kushal Jain et al.
In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models weaken the powerful decoder by applying uniformly random dropout to the decoder input. We show theoretically that this removes pointwise mutual information provided by the decoder input, which is compensated for by utilizing the latent space. We then propose an adversarial training strategy to achieve information-based stochastic dropout. Compared to uniform dropout on standard text benchmark datasets, our targeted approach increases both sequence modeling performance and the information captured in the latent space.
NCMar 30, 2023
Optimized EEG based mood detection with signal processing and deep neural networks for brain-computer interfaceSubhrangshu Adhikary, Kushal Jain, Biswajit Saha et al.
Electroencephalogram (EEG) is a very promising and widely implemented procedure to study brain signals and activities by amplifying and measuring the post-synaptical potential arising from electrical impulses produced by neurons and detected by specialized electrodes attached to specific points in the scalp. It can be studied for detecting brain abnormalities, headaches, and other conditions. However, there are limited studies performed to establish a smart decision-making model to identify EEG's relation with the mood of the subject. In this experiment, EEG signals of 28 healthy human subjects have been observed with consent and attempts have been made to study and recognise moods. Savitzky-Golay band-pass filtering and Independent Component Analysis have been used for data filtration.Different neural network algorithms have been implemented to analyze and classify the EEG data based on the mood of the subject. The model is further optimised by the usage of Blackman window-based Fourier Transformation and extracting the most significant frequencies for each electrode. Using these techniques, up to 96.01% detection accuracy has been obtained.
CLNov 14, 2023
First-Step Advantage: Importance of Starting Right in Multi-Step Math ReasoningKushal Jain, Moritz Miller, Niket Tandon et al.
Language models can solve complex reasoning tasks better by learning to generate rationales for their predictions. Often these models know how to solve a task but their auto-regressive decoding nature leads to incorrect results if they start incorrectly. We observe that smaller models in particular when corrected, can solve a task that they would have otherwise struggled with. We demonstrate this phenomenon by using a larger model to guide smaller models, which leads to significantly improved performance (up to +24 points on the GSM8K dataset by 7B models). To assist smaller models in initiating the starting step, we propose QuestCoT, where a smaller model first asks itself how to start, before proceeding with a chain of reasoning. On various multistep mathematical reasoning datasets over multiple smaller models, we show that getting the right start can lead to significant performance gains across all models (gains of up to +6 points on GSM8K, +9 on SVAMP, +5 on ASDiv, and +7 on MultiArith).
CLNov 4, 2020Code
Indic-Transformers: An Analysis of Transformer Language Models for Indian LanguagesKushal Jain, Adwait Deshpande, Kumar Shridhar et al.
Language models based on the Transformer architecture have achieved state-of-the-art performance on a wide range of NLP tasks such as text classification, question-answering, and token classification. However, this performance is usually tested and reported on high-resource languages, like English, French, Spanish, and German. Indian languages, on the other hand, are underrepresented in such benchmarks. Despite some Indian languages being included in training multilingual Transformer models, they have not been the primary focus of such work. In order to evaluate the performance on Indian languages specifically, we analyze these language models through extensive experiments on multiple downstream tasks in Hindi, Bengali, and Telugu language. Here, we compare the efficacy of fine-tuning model parameters of pre-trained models against that of training a language model from scratch. Moreover, we empirically argue against the strict dependency between the dataset size and model performance, but rather encourage task-specific model and method selection. We achieve state-of-the-art performance on Hindi and Bengali languages for text classification task. Finally, we present effective strategies for handling the modeling of Indian languages and we release our model checkpoints for the community : https://huggingface.co/neuralspace-reverie.
CVSep 26, 2023
Face Cartoonisation For Various Poses Using StyleGANKushal Jain, Ankith Varun J, Anoop Namboodiri
This paper presents an innovative approach to achieve face cartoonisation while preserving the original identity and accommodating various poses. Unlike previous methods in this field that relied on conditional-GANs, which posed challenges related to dataset requirements and pose training, our approach leverages the expressive latent space of StyleGAN. We achieve this by introducing an encoder that captures both pose and identity information from images and generates a corresponding embedding within the StyleGAN latent space. By subsequently passing this embedding through a pre-trained generator, we obtain the desired cartoonised output. While many other approaches based on StyleGAN necessitate a dedicated and fine-tuned StyleGAN model, our method stands out by utilizing an already-trained StyleGAN designed to produce realistic facial images. We show by extensive experimentation how our encoder adapts the StyleGAN output to better preserve identity when the objective is cartoonisation.
CLApr 3, 2025
UNDO: Understanding Distillation as OptimizationKushal Jain, Piyushi Goyal, Kumar Shridhar
Knowledge distillation has emerged as an effective strategy for compressing large language models' (LLMs) knowledge into smaller, more efficient student models. However, standard one-shot distillation methods often produce suboptimal results due to a mismatch between teacher-generated rationales and the student's specific learning requirements. In this paper, we introduce the UNDO: UNderstanding Distillation as Optimization framework, designed to bridge this gap by iteratively identifying the student's errors and prompting the teacher to refine its explanations accordingly. Each iteration directly targets the student's learning deficiencies, motivating the teacher to provide tailored and enhanced rationales that specifically address these weaknesses. Empirical evaluations on various challenging mathematical and commonsense reasoning tasks demonstrate that our iterative distillation method, UNDO, significantly outperforms standard one-step distillation methods, achieving performance gains of up to 20%. Additionally, we show that teacher-generated data refined through our iterative process remains effective even when applied to different student models, underscoring the broad applicability of our approach. Our work fundamentally reframes knowledge distillation as an iterative teacher-student interaction, effectively leveraging dynamic refinement by the teacher for better knowledge distillation.