Adarsh Kumar

CL
h-index1
16papers
3,454citations
Novelty36%
AI Score32

16 Papers

SDFeb 10, 2023
GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers

Pedro Sarmento, Adarsh Kumar, Yu-Hua Chen et al.

Recently, symbolic music generation with deep learning techniques has witnessed steady improvements. Most works on this topic focus on MIDI representations, but less attention has been paid to symbolic music generation using guitar tablatures (tabs) which can be used to encode multiple instruments. Tabs include information on expressive techniques and fingerings for fretted string instruments in addition to rhythm and pitch. In this work, we use the DadaGP dataset for guitar tab music generation, a corpus of over 26k songs in GuitarPro and token formats. We introduce methods to condition a Transformer-XL deep learning model to generate guitar tabs (GTR-CTRL) based on desired instrumentation (inst-CTRL) and genre (genre-CTRL). Special control tokens are appended at the beginning of each song in the training corpus. We assess the performance of the model with and without conditioning. We propose instrument presence metrics to assess the inst-CTRL model's response to a given instrumentation prompt. We trained a BERT model for downstream genre classification and used it to assess the results obtained with the genre-CTRL model. Statistical analyses evidence significant differences between the conditioned and unconditioned models. Overall, results indicate that the GTR-CTRL methods provide more flexibility and control for guitar-focused symbolic music generation than an unconditioned model.

CLMay 9, 2022
CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech

Punyajoy Saha, Kanishk Singh, Adarsh Kumar et al.

Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi - an ensemble of generative discriminators (GeDi) to guide the generation of a DialoGPT model toward more polite, detoxified, and emotionally laden counterspeech. We generate counterspeech using three datasets and observe significant improvement across different attribute scores. The politeness and detoxification scores increased by around 15% and 6% respectively, while the emotion in the counterspeech increased by at least 10% across all the datasets. We also experiment with triple-attribute control and observe significant improvement over single attribute results when combining complementing attributes, e.g., politeness, joyfulness and detoxification. In all these experiments, the relevancy of the generated text does not deteriorate due to the application of these controls

SDApr 18, 2023
From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation

Adarsh Kumar, Pedro Sarmento

Subword tokenization has been widely successful in text-based natural language processing (NLP) tasks with Transformer-based models. As Transformer models become increasingly popular in symbolic music-related studies, it is imperative to investigate the efficacy of subword tokenization in the symbolic music domain. In this paper, we explore subword tokenization techniques, such as byte-pair encoding (BPE), in symbolic music generation and its impact on the overall structure of generated songs. Our experiments are based on three types of MIDI datasets: single track-melody only, multi-track with a single instrument, and multi-track and multi-instrument. We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time and improves the overall structure of the generated music in terms of objective metrics like structure indicator (SI), Pitch Class Entropy, etc. We also compare two subword tokenization methods, BPE and Unigram, and observe that both methods lead to consistent improvements. Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition, particularly in cases involving complex data such as multi-track songs.

AIMay 13, 2025
Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification

Adarsh Kumar, Hwiyoon Kim, Jawahar Sai Nathani et al.

Hallucination, where large language models (LLMs) generate confident but incorrect or irrelevant information, remains a key limitation in their application to complex, open-ended tasks. Chain-of-thought (CoT) prompting has emerged as a promising method for improving multistep reasoning by guiding models through intermediate steps. However, CoT alone does not fully address the hallucination problem. In this work, we investigate how combining CoT with retrieval-augmented generation (RAG), as well as applying self-consistency and self-verification strategies, can reduce hallucinations and improve factual accuracy. By incorporating external knowledge sources during reasoning and enabling models to verify or revise their own outputs, we aim to generate more accurate and coherent responses. We present a comparative evaluation of baseline LLMs against CoT, CoT+RAG, self-consistency, and self-verification techniques. Our results highlight the effectiveness of each method and identify the most robust approach for minimizing hallucinations while preserving fluency and reasoning depth.

CVMay 13, 2025
Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction

Adarsh Kumar

Effective dietary monitoring is critical for managing Type 2 diabetes, yet accurately estimating caloric intake remains a major challenge. While continuous glucose monitors (CGMs) offer valuable physiological data, they often fall short in capturing the full nutritional profile of meals due to inter-individual and meal-specific variability. In this work, we introduce a multimodal deep learning framework that jointly leverages CGM time-series data, Demographic/Microbiome, and pre-meal food images to enhance caloric estimation. Our model utilizes attention based encoding and a convolutional feature extraction for meal imagery, multi-layer perceptrons for CGM and Microbiome data followed by a late fusion strategy for joint reasoning. We evaluate our approach on a curated dataset of over 40 participants, incorporating synchronized CGM, Demographic and Microbiome data and meal photographs with standardized caloric labels. Our model achieves a Root Mean Squared Relative Error (RMSRE) of 0.2544, outperforming the baselines models by over 50%. These findings demonstrate the potential of multimodal sensing to improve automated dietary assessment tools for chronic disease management.

DCNov 20, 2021
Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman et al.

Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models in a distributed fashion. Training is resource-intensive, consuming significant compute, memory, and network resources. Many prior works explore how to reduce training resource footprint without impacting quality, but their focus on a subset of the bottlenecks (typically only the network) limits their ability to improve overall cluster utilization. In this work, we exploit the unique characteristics of deep learning workloads to propose Structured Partial Backpropagation(SPB), a technique that systematically controls the amount of backpropagation at individual workers in distributed training. This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality. To efficiently leverage the benefits of SPB at cluster level, we introduce JigSaw, a SPB aware scheduler, which does scheduling at the iteration level for Deep Learning Training(DLT) jobs. We find that JigSaw can improve large scale cluster efficiency by as high as 28\%.

SDJul 30, 2021
DadaGP: A Dataset of Tokenized GuitarPro Songs for Sequence Models

Pedro Sarmento, Adarsh Kumar, CJ Carr et al.

Originating in the Renaissance and burgeoning in the digital era, tablatures are a commonly used music notation system which provides explicit representations of instrument fingerings rather than pitches. GuitarPro has established itself as a widely used tablature format and software enabling musicians to edit and share songs for musical practice, learning, and composition. In this work, we present DadaGP, a new symbolic music dataset comprising 26,181 song scores in the GuitarPro format covering 739 musical genres, along with an accompanying tokenized format well-suited for generative sequence models such as the Transformer. The tokenized format is inspired by event-based MIDI encodings, often used in symbolic music generation models. The dataset is released with an encoder/decoder which converts GuitarPro files to tokens and back. We present results of a use case in which DadaGP is used to train a Transformer-based model to generate new songs in GuitarPro format. We discuss other relevant use cases for the dataset (guitar-bass transcription, music style transfer and artist/genre classification) as well as ethical implications. DadaGP opens up the possibility to train GuitarPro score generators, fine-tune models on custom data, create new styles of music, AI-powered songwriting apps, and human-AI improvisation.

AIJul 28, 2021
Quantum Artificial Intelligence for the Science of Climate Change

Manmeet Singh, Chirag Dhara, Adarsh Kumar et al.

Climate change has become one of the biggest global problems increasingly compromising the Earth's habitability. Recent developments such as the extraordinary heat waves in California & Canada, and the devastating floods in Germany point to the role of climate change in the ever-increasing frequency of extreme weather. Numerical modelling of the weather and climate have seen tremendous improvements in the last five decades, yet stringent limitations remain to be overcome. Spatially and temporally localized forecasting is the need of the hour for effective adaptation measures towards minimizing the loss of life and property. Artificial Intelligence-based methods are demonstrating promising results in improving predictions, but are still limited by the availability of requisite hardware and software required to process the vast deluge of data at a scale of the planet Earth. Quantum computing is an emerging paradigm that has found potential applicability in several fields. In this opinion piece, we argue that new developments in Artificial Intelligence algorithms designed for quantum computers - also known as Quantum Artificial Intelligence (QAI) - may provide the key breakthroughs necessary to furthering the science of climate change. The resultant improvements in weather and climate forecasts are expected to cascade to numerous societal benefits.

CLMar 10, 2021
Team Phoenix at WASSA 2021: Emotion Analysis on News Stories with Pre-Trained Language Models

Yash Butala, Kanishk Singh, Adarsh Kumar et al.

Emotion is fundamental to humanity. The ability to perceive, understand and respond to social interactions in a human-like manner is one of the most desired capabilities in artificial agents, particularly in social-media bots. Over the past few years, computational understanding and detection of emotional aspects in language have been vital in advancing human-computer interaction. The WASSA Shared Task 2021 released a dataset of news-stories across two tracks, Track-1 for Empathy and Distress Prediction and Track-2 for Multi-Dimension Emotion prediction at the essay-level. We describe our system entry for the WASSA 2021 Shared Task (for both Track-1 and Track-2), where we leveraged the information from Pre-trained language models for Track-specific Tasks. Our proposed models achieved an Average Pearson Score of 0.417 and a Macro-F1 Score of 0.502 in Track 1 and Track 2, respectively. In the Shared Task leaderboard, we secured 4th rank in Track 1 and 2nd rank in Track 2.

LGJan 18, 2021
Accelerating Deep Learning Inference via Learned Caches

Arjun Balasubramanian, Adarsh Kumar, Yuhan Liu et al.

Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads. We observe that caching hidden layer outputs of the DNN can introduce a form of late-binding where inference requests only consume the amount of computation needed. This enables a mechanism for achieving low latencies, coupled with an ability to exploit temporal locality. However, traditional caching approaches incur high memory overheads and lookup latencies, leading us to design learned caches - caches that consist of simple ML models that are continuously updated. We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference. Results show that GATI can reduce inference latency by up to 7.69X on realistic workloads.

CLJan 14, 2021
Hostility Detection in Hindi leveraging Pre-Trained Language Models

Ojasv Kamal, Adarsh Kumar, Tejas Vaidhya

Hostile content on social platforms is ever increasing. This has led to the need for proper detection of hostile posts so that appropriate action can be taken to tackle them. Though a lot of work has been done recently in the English Language to solve the problem of hostile content online, similar works in Indian Languages are quite hard to find. This paper presents a transfer learning based approach to classify social media (i.e Twitter, Facebook, etc.) posts in Hindi Devanagari script as Hostile or Non-Hostile. Hostile posts are further analyzed to determine if they are Hateful, Fake, Defamation, and Offensive. This paper harnesses attention based pre-trained models fine-tuned on Hindi data with Hostile-Non hostile task as Auxiliary and fusing its features for further sub-tasks classification. Through this approach, we establish a robust and consistent model without any ensembling or complex pre-processing. We have presented the results from our approach in CONSTRAINT-2021 Shared Task on hostile post detection where our model performs extremely well with 3rd runner up in terms of Weighted Fine-Grained F1 Score.

LGAug 4, 2020
Can Adversarial Weight Perturbations Inject Neural Backdoors?

Siddhant Garg, Adarsh Kumar, Vibhor Goel et al.

Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of using publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an $\ell_{\infty}$ norm around the original model weights. We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. We empirically show that these adversarial weight perturbations exist universally across several computer vision and natural language processing tasks. Our results show that backdoors can be successfully injected with a very small average relative change in model weight values for several applications.

LGFeb 7, 2020
Accelerating Deep Learning Inference via Freezing

Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman et al.

Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests. We find that this can potentially reduce the number of effective layers by half for 91.58% of CIFAR-10 requests run on ResNet-18. We present Freeze Inference, a system that introduces approximate caching at each intermediate layer and we discuss techniques to reduce the cache size and improve the cache hit rate. Finally, we discuss some of the open research challenges in realizing such a design.

CLFeb 7, 2020
Translating Web Search Queries into Natural Language Questions

Adarsh Kumar, Sandipan Dandapat, Sushil Chordia

Users often query a search engine with a specific question in mind and often these queries are keywords or sub-sentential fragments. For example, if the users want to know the answer for "What's the capital of USA", they will most probably query "capital of USA" or "USA capital" or some keyword-based variation of this. For example, for the user entered query "capital of USA", the most probable question intent is "What's the capital of USA?". In this paper, we are proposing a method to generate well-formed natural language question from a given keyword-based query, which has the same question intent as the query. Conversion of keyword-based web query into a well-formed question has lots of applications, with some of them being in search engines, Community Question Answering (CQA) website and bots communication. We found a synergy between query-to-question problem with standard machine translation(MT) task. We have used both Statistical MT (SMT) and Neural MT (NMT) models to generate the questions from the query. We have observed that MT models perform well in terms of both automatic and human evaluation.

CLFeb 7, 2020
MA-DST: Multi-Attention Based Scalable Dialog State Tracking

Adarsh Kumar, Peter Ku, Anuj Kumar Goyal et al.

Task oriented dialog agents provide a natural language interface for users to complete their goal. Dialog State Tracking (DST), which is often a core component of these systems, tracks the system's understanding of the user's goal throughout the conversation. To enable accurate multi-domain DST, the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context, including long-range cross-domain references. We introduce a novel architecture for this task to encode the conversation history and slot semantics more robustly by using attention mechanisms at multiple granularities. In particular, we use cross-attention to model relationships between the context and slots at different semantic levels and self-attention to resolve cross-domain coreferences. In addition, our proposed architecture does not rely on knowing the domain ontologies beforehand and can also be used in a zero-shot setting for new domains or unseen slot values. Our model improves the joint goal accuracy by 5% (absolute) in the full-data setting and by up to 2% (absolute) in the zero-shot setting over the present state-of-the-art on the MultiWoZ 2.1 dataset.

CLJul 2, 2019
MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines

Mihail Eric, Rahul Goel, Shachi Paul et al.

MultiWOZ 2.0 (Budzianowski et al., 2018) is a recently released multi-domain dialogue dataset spanning 7 distinct domains and containing over 10,000 dialogues. Though immensely useful and one of the largest resources of its kind to-date, MultiWOZ 2.0 has a few shortcomings. Firstly, there is substantial noise in the dialogue state annotations and dialogue utterances which negatively impact the performance of state-tracking models. Secondly, follow-up work (Lee et al., 2019) has augmented the original dataset with user dialogue acts. This leads to multiple co-existent versions of the same dataset with minor modifications. In this work we tackle the aforementioned issues by introducing MultiWOZ 2.1. To fix the noisy state annotations, we use crowdsourced workers to re-annotate state and utterances based on the original utterances in the dataset. This correction process results in changes to over 32% of state annotations across 40% of the dialogue turns. In addition, we fix 146 dialogue utterances by canonicalizing slot values in the utterances to the values in the dataset ontology. To address the second problem, we combined the contributions of the follow-up works into MultiWOZ 2.1. Hence, our dataset also includes user dialogue acts as well as multiple slot descriptions per dialogue state slot. We then benchmark a number of state-of-the-art dialogue state tracking models on the MultiWOZ 2.1 dataset and show the joint state tracking performance on the corrected state annotations. We are publicly releasing MultiWOZ 2.1 to the community, hoping that this dataset resource will allow for more effective models across various dialogue subproblems to be built in the future.