BMOct 5, 2022Code
Reprogramming Pretrained Language Models for Antibody Sequence InfillingIgor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen et al.
Antibodies comprise the most versatile class of binding molecules, with numerous applications in biomedicine. Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency. Unique to antibodies, designing the complementarity-determining region (CDR), which determines the antigen binding affinity and specificity, creates its own unique challenges. Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance, particularly lacking diversity in the generated sequences. In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data - where it may be difficult to train a high-performing model from scratch or effectively fine-tune an existing pre-trained model on the specific task. Specifically, we introduce ReprogBert in which a pretrained English language model is repurposed for protein sequence infilling - thus considers cross-language adaptation using less data. Results on antibody design benchmarks show that our model on low-resourced antibody sequence dataset provides highly diverse CDR sequences, up to more than a two-fold increase of diversity over the baselines, without losing structural integrity and naturalness. The generated sequences also demonstrate enhanced antigen binding specificity and virus neutralization ability. Code is available at https://github.com/IBM/ReprogBERT
68.0LGJun 2
Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic DataDevleena Das, Rajeev Patwari, Elliott Delaye et al.
Aggressive weight quantization to 2-bit precision offers substantial throughput and memory gains for large language model (LLM) inference, but typically incurs severe accuracy degradation. These gains are particularly relevant for edge and on-device deployment, where memory capacity and bandwidth are primary constraints. In this work, we extend Recover-LoRA -- a lightweight, data-free accuracy recovery method originally developed for general model weight corruption -- to the setting of ultra-low-bit quantization. We propose a selective mixed-precision strategy in which only gate and up projection layers of the MLP are quantized to 2-bit (W2), while all other linear layers remain at higher precision, yielding a mixed-precision GateUp configuration. We demonstrate via roofline analysis across three model families (4B--20B) and two hardware platforms that a W4/W2-GateUp deployment (4-bit base with 2-bit gate/up) delivers 7.5--23.3\% TPS improvement over uniform W4 depending on model and context length, while confining quantization error to a predictable subset of layers. We then apply Recover-LoRA -- training low-rank adapters on the quantized layers via logit distillation with synthetic data -- to recover accuracy lost from 2-bit quantization of the gate and up layers. In a case study on Qwen3-4B, Recover-LoRA achieves 80--95\% accuracy recovery on 9 of 12 benchmarks, using only 10k synthetic training samples and no labeled data. We further demonstrate that synthetic data performs comparably to curated labeled data for distillation-based recovery, and that recovery generalizes to out-of-distribution evaluation tasks. Our results present Recover-LoRA as a practical post-quantization accuracy recovery tool for aggressive weight compression in deployment settings.
LGSep 21, 2023
State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User UnderstandingDevleena Das, Sonia Chernova, Been Kim
As more non-AI experts use complex AI systems for daily tasks, there has been an increasing effort to develop methods that produce explanations of AI decision making that are understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classification techniques, and we posit that the few existing methods for sequential decision making are limited in scope. In this work, we first contribute a desiderata for defining concepts in sequential decision making settings. Additionally, inspired by the Protege Effect which states explaining knowledge often reinforces one's self-learning, we explore how concept-based explanations of an RL agent's decision making can in turn improve the agent's learning rate, as well as improve end-user understanding of the agent's decision making. To this end, we contribute a unified framework, State2Explanation (S2E), that involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging such learned model to both (1) inform reward shaping during an agent's training, and (2) provide explanations to end-users at deployment for improved task performance. Our experimental validations, in Connect 4 and Lunar Lander, demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end user task performance at deployment time.
AIMay 4, 2022
Explainable Knowledge Graph Embedding: Inference Reconciliation for Knowledge Inferences Supporting Robot ActionsAngel Daruna, Devleena Das, Sonia Chernova
Learned knowledge graph representations supporting robots contain a wealth of domain knowledge that drives robot behavior. However, there does not exist an inference reconciliation framework that expresses how a knowledge graph representation affects a robot's sequential decision making. We use a pedagogical approach to explain the inferences of a learned, black-box knowledge graph representation, a knowledge graph embedding. Our interpretable model, uses a decision tree classifier to locally approximate the predictions of the black-box model, and provides natural language explanations interpretable by non-experts. Results from our algorithmic evaluation affirm our model design choices, and the results of our user studies with non-experts support the need for the proposed inference reconciliation framework. Critically, results from our simulated robot evaluation indicate that our explanations enable non-experts to correct erratic robot behaviors due to nonsensical beliefs within the black-box.
CLOct 25, 2023
DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via Unsupervised Core-Set SelectionDevleena Das, Vivek Khetan
Recent advances have led to the availability of many pre-trained language models (PLMs); however, a question that remains is how much data is truly needed to fine-tune PLMs for downstream tasks? In this work, we introduce DEFT-UCS, a data-efficient fine-tuning framework that leverages unsupervised core-set selection to identify a smaller, representative dataset that reduces the amount of data needed to fine-tune PLMs for downstream tasks. We examine the efficacy of DEFT-UCS in the context of text-editing LMs, and compare to the state-of-the art text-editing model, CoEDIT. Our results demonstrate that DEFT-UCS models are just as accurate as CoEDIT, across eight different datasets consisting of six different editing tasks, while finetuned on 70% less data.
LGDec 1, 2025
KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context InferenceSai Gokhale, Devleena Das, Rajeev Patwari et al.
Long-context Large Language Models (LLMs) face significant memory bottlenecks during inference due to the linear growth of key-value (KV) cache with sequence length. While individual optimization techniques like KV cache quantization, chunked prefill, and model weight quantization have shown promise, their joint effects and optimal configurations for edge deployment remain underexplored. We introduce KV Pareto, a systems-level framework that systematically maps the trade-off frontier between total memory consumption and task accuracy across these three complementary optimization techniques. Our framework evaluates multiple LLM architectures (Qwen, Llama, Mistral) with varying KV quantization schemes (int2/4/8, mixed-precision), granularities (per-token, per-tensor, per-block), and 4-bit weight quantization via AWQ. Our framework identifies model-specific Pareto-optimal configurations that achieve 68-78% total memory reduction with minimal (1-3%) accuracy degradation on long-context tasks. We additionally verify the selected frontiers on additional benchmarks of Needle-in-a-Haystack, GSM8k and MMLU as well as extended context lengths of up to 128k to demonstrate the practical need of joint optimization for efficient LLM inference.
PFJul 29, 2025
Forecasting LLM Inference Performance via Hardware-Agnostic Analytical ModelingRajeev Patwari, Ashish Sirasao, Devleena Das
Large language models (LLMs) have been increasingly deployed as local agents on personal devices with CPUs, NPUs and integrated GPUs. However, forecasting inference performance on devices with such heterogeneity remains challenging due to the dynamic compute and memory demands. Existing approaches rely on GPU benchmarking or machine learning-based latency predictors, which are often hardware-specific and lack generalizability. To this end, we introduce LIFE, a lightweight and modular analytical framework that is comprised of modular analytical model of operators, configurable to characterize LLM inference workloads in a hardware and dataset-agnostic manner. LIFE characterizes the influence of software and model optimizations, such as quantization, KV cache compression, LoRA adapters, chunked prefill, different attentions, and operator fusion, on performance metrics such as time-to-first-token (TTFT), time-per-output-token (TPOT) and tokens-per-second (TPS). LIFE enables performance forecasting using only hardware specifications, such as TOPS and memory bandwidth, without requiring extensive dataset benchmarking. We validate LIFE's forecasting with inference on AMD Ryzen CPUs, NPUs, iGPUs and NVIDIA V100 GPUs, with Llama2-7B variants, demonstrating the utility of LIFE in forecasting LLM performance through lens of system efficiency to enable efficient LLM deployment across different hardware platforms.
CLOct 6, 2025
Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank AdaptationDevleena Das, Rajeev Patwari, Ashish Sirasao
Inference optimizations such as quantization, pruning, format and datatype conversion, model export, and serialization can lead to functional degradations in language model task performance. While most efforts on performance recovery for deployment focus on robust quantization techniques, we focus on recovering model accuracies from any sources that degrade model weights, such as improper model serialization. In this work, we propose Recover-LoRA, a lightweight and dataset agnostic method to recover accuracy in degraded models. Recover-LoRA uses synthetic data and logit distillation to learn LoRA adapters on selective layers that facilitate aligning the degraded model to its full precision model. We investigate the utility of Recover-LoRA across a diverse set of small language models (SLMs), including models with varying attention architectures, multi-head attention (MHA) and group-query attention (GQA), as well as several evaluation datasets. Our results show that Recover-LoRA recovers model accuracies by 5-17% on MHA and GQA SLMs.
CVJul 11, 2025
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free WayRajarshi Roy, Devleena Das, Ankesh Banerjee et al.
We introduce ByDeWay, a training-free framework designed to enhance the performance of Multimodal Large Language Models (MLLMs). ByDeWay uses a novel prompting strategy called Layered-Depth-Based Prompting (LDP), which improves spatial reasoning and grounding without modifying any model parameters. It segments the scene into closest, mid-range, and farthest layers using monocular depth estimation, then generates region-specific captions with a grounded vision-language model. These structured, depth-aware captions are appended to the image-question prompt, enriching it with spatial context. This guides MLLMs to produce more grounded and less hallucinated responses. Our method is lightweight, modular, and compatible with black-box MLLMs. Experiments on hallucination-sensitive (POPE) and reasoning-intensive (GQA) benchmarks show consistent improvements across multiple MLLMs, validating the effectiveness of depth-aware prompting in a zero-training setting.
AIJan 11, 2022
Subgoal-Based Explanations for Unreliable Intelligent Decision Support SystemsDevleena Das, Been Kim, Sonia Chernova
Intelligent decision support (IDS) systems leverage artificial intelligence techniques to generate recommendations that guide human users through the decision making phases of a task. However, a key challenge is that IDS systems are not perfect, and in complex real-world scenarios may produce incorrect output or fail to work altogether. The field of explainable AI planning (XAIP) has sought to develop techniques that make the decision making of sequential decision making AI systems more explainable to end-users. Critically, prior work in applying XAIP techniques to IDS systems has assumed that the plan being proposed by the planner is always optimal, and therefore the action or plan being recommended as decision support to the user is always correct. In this work, we examine novice user interactions with a non-robust IDS system -- one that occasionally recommends the wrong action, and one that may become unavailable after users have become accustomed to its guidance. We introduce a novel explanation type, subgoal-based explanations, for planning-based IDS systems, that supplements traditional IDS output with information about the subgoal toward which the recommended action would contribute. We demonstrate that subgoal-based explanations lead to improved user task performance, improve user ability to distinguish optimal and suboptimal IDS recommendations, are preferred by users, and enable more robust user performance in the case of IDS failure
ROAug 8, 2021
Semantic-Based Explainable AI: Leveraging Semantic Scene Graphs and Pairwise Ranking to Explain Robot FailuresDevleena Das, Sonia Chernova
When interacting in unstructured human environments, occasional robot failures are inevitable. When such failures occur, everyday people, rather than trained technicians, will be the first to respond. Existing natural language explanations hand-annotate contextual information from an environment to help everyday people understand robot failures. However, this methodology lacks generalizability and scalability. In our work, we introduce a more generalizable semantic explanation framework. Our framework autonomously captures the semantic information in a scene to produce semantically descriptive explanations for everyday users. To generate failure-focused explanations that are semantically grounded, we leverages both semantic scene graphs to extract spatial relations and object attributes from an environment, as well as pairwise ranking. Our results show that these semantically descriptive explanations significantly improve everyday users' ability to both identify failures and provide assistance for recovery than the existing state-of-the-art context-based explanations.
LGMay 20, 2021
Explainable Activity Recognition for Smart Home SystemsDevleena Das, Yasutaka Nishimura, Rajan P. Vivek et al.
Smart home environments are designed to provide services that help improve the quality of life for the occupant via a variety of sensors and actuators installed throughout the space. Many automated actions taken by a smart home are governed by the output of an underlying activity recognition system. However, activity recognition systems may not be perfectly accurate and therefore inconsistencies in smart home operations can lead users reliant on smart home predictions to wonder "why did the smart home do that?" In this work, we build on insights from Explainable Artificial Intelligence (XAI) techniques and introduce an explainable activity recognition framework in which we leverage leading XAI methods to generate natural language explanations that explain what about an activity led to the given classification. Within the context of remote caregiver monitoring, we perform a two-step evaluation: (a) utilize ML experts to assess the sensibility of explanations, and (b) recruit non-experts in two user remote caregiver monitoring scenarios, synchronous and asynchronous, to assess the effectiveness of explanations generated via our framework. Our results show that the XAI approach, SHAP, has a 92% success rate in generating sensible explanations. Moreover, in 83% of sampled scenarios users preferred natural language explanations over a simple activity label, underscoring the need for explainable activity recognition systems. Finally, we show that explanations generated by some XAI methods can lead users to lose confidence in the accuracy of the underlying activity recognition model. We make a recommendation regarding which existing XAI method leads to the best performance in the domain of smart home automation, and discuss a range of topics for future work to further improve explainable activity recognition.
AIJan 5, 2021
Explainable AI for Robot Failures: Generating Explanations that Improve User Assistance in Fault RecoveryDevleena Das, Siddhartha Banerjee, Sonia Chernova
With the growing capabilities of intelligent systems, the integration of robots in our everyday life is increasing. However, when interacting in such complex human environments, the occasional failure of robotic systems is inevitable. The field of explainable AI has sought to make complex-decision making systems more interpretable but most existing techniques target domain experts. On the contrary, in many failure cases, robots will require recovery assistance from non-expert users. In this work, we introduce a new type of explanation, that explains the cause of an unexpected failure during an agent's plan execution to non-experts. In order for error explanations to be meaningful, we investigate what types of information within a set of hand-scripted explanations are most helpful to non-experts for failure and solution identification. Additionally, we investigate how such explanations can be autonomously generated, extending an existing encoder-decoder model, and generalized across environments. We investigate such questions in the context of a robot performing a pick-and-place manipulation task in the home environment. Our results show that explanations capturing the context of a failure and history of past actions, are the most effective for failure and solution identification among non-experts. Furthermore, through a second user evaluation, we verify that our model-generated explanations can generalize to an unseen office environment, and are just as effective as the hand-scripted explanations.
AINov 18, 2020
Explainable AI for System Failures: Generating Explanations that Improve Human Assistance in Fault RecoveryDevleena Das, Siddhartha Banerjee, Sonia Chernova
With the growing capabilities of intelligent systems, the integration of artificial intelligence (AI) and robots in everyday life is increasing. However, when interacting in such complex human environments, the failure of intelligent systems, such as robots, can be inevitable, requiring recovery assistance from users. In this work, we develop automated, natural language explanations for failures encountered during an AI agents' plan execution. These explanations are developed with a focus of helping non-expert users understand different point of failures to better provide recovery assistance. Specifically, we introduce a context-based information type for explanations that can both help non-expert users understand the underlying cause of a system failure, and select proper failure recoveries. Additionally, we extend an existing sequence-to-sequence methodology to automatically generate our context-based explanations. By doing so, we are able develop a model that can generalize context-based explanations over both different failure types and failure scenarios.
HCFeb 11, 2020
Leveraging Rationales to Improve Human Task PerformanceDevleena Das, Sonia Chernova
Machine learning (ML) systems across many application areas are increasingly demonstrating performance that is beyond that of humans. In response to the proliferation of such models, the field of Explainable AI (XAI) has sought to develop techniques that enhance the transparency and interpretability of machine learning methods. In this work, we consider a question not previously explored within the XAI and ML communities: Given a computational system whose performance exceeds that of its human user, can explainable AI capabilities be leveraged to improve the performance of the human? We study this question in the context of the game of Chess, for which computational game engines that surpass the performance of the average player are widely available. We introduce the Rationale-Generating Algorithm, an automated technique for generating rationales for utility-based computational methods, which we evaluate with a multi-day user study against two baselines. The results show that our approach produces rationales that lead to statistically significant improvement in human task performance, demonstrating that rationales automatically generated from an AI's internal task model can be used not only to explain what the system is doing, but also to instruct the user and ultimately improve their task performance.