CLMay 31, 2025Code
DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity EnvironmentsChiyu Zhang, Marc-Alexandre Cote, Michael Albada et al.
Large language model (LLM) agents have shown impressive capabilities in human language comprehension and reasoning, yet their potential in cybersecurity remains underexplored. We introduce DefenderBench, a practical, open-source toolkit for evaluating language agents across offense, defense, and cybersecurity knowledge-based tasks. DefenderBench includes environments for network intrusion, malicious content detection, code vulnerability analysis, and cybersecurity knowledge assessment. It is intentionally designed to be affordable and easily accessible for researchers while providing fair and rigorous assessment. We benchmark several state-of-the-art (SoTA) and popular LLMs, including both open- and closed-weight models, using a standardized agentic framework. Our results show that Claude-3.7-sonnet performs best with a DefenderBench score of 81.65, followed by Claude-3.7-sonnet-think with 78.40, while the best open-weight model, Llama 3.3 70B, is not far behind with a DefenderBench score of 71.81. DefenderBench's modular design allows seamless integration of custom LLMs and tasks, promoting reproducibility and fair comparisons. An anonymized version of DefenderBench is available at https://github.com/microsoft/DefenderBench.
LGNov 26, 2019Code
AuthorGAN: Improving GAN Reproducibility using a Modular GAN FrameworkRaunak Sinha, Anush Sankaran, Mayank Vatsa et al.
Generative models are becoming increasingly popular in the literature, with Generative Adversarial Networks (GAN) being the most successful variant, yet. With this increasing demand and popularity, it is becoming equally difficult and challenging to implement and consume GAN models. A qualitative user survey conducted across 47 practitioners show that expert level skill is required to use GAN model for a given task, despite the presence of various open source libraries. In this research, we propose a novel system called AuthorGAN, aiming to achieve true democratization of GAN authoring. A highly modularized library agnostic representation of GAN model is defined to enable interoperability of GAN architecture across different libraries such as Keras, Tensorflow, and PyTorch. An intuitive drag-and-drop based visual designer is built using node-red platform to enable custom architecture designing without the need for writing any code. Five different GAN models are implemented as a part of this framework and the performance of the different GAN models are shown using the benchmark MNIST dataset.
SEJan 4, 2018Code
DeepTriage: Exploring the Effectiveness of Deep Learning for Bug TriagingSenthil Mani, Anush Sankaran, Rahul Aralikatte
For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, with the bug title and description as the input, mapping it to one of the available developers (classes). The major challenge is that the bug description usually contains a combination of free unstructured text, code snippets, and stack trace making the input data noisy. The existing bag-of-words (BOW) feature models do not consider the syntactical and sequential word information available in the unstructured text. We propose a novel bug report representation algorithm using an attention based deep bidirectional recurrent neural network (DBRNN-A) model that learns a syntactic and semantic feature from long word sequences in an unsupervised manner. Instead of BOW features, the DBRNN-A based bug representation is then used for training the classifier. Using an attention mechanism enables the model to learn the context representation over a long word sequence, as in a bug report. To provide a large amount of data to learn the feature learning model, the unfixed bug reports (~70% bugs in an open source bug tracking system) are leveraged, which were completely ignored in the previous studies. Another contribution is to make this research reproducible by making the source code available and creating a public benchmark dataset of bug reports from three open source bug tracking system: Google Chromium (383,104 bug reports), Mozilla Core (314,388 bug reports), and Mozilla Firefox (162,307 bug reports). Experimentally we compare our approach with BOW model and machine learning approaches and observe that DBRNN-A provides a higher rank-10 average accuracy.
LGNov 9, 2017Code
DLPaper2Code: Auto-generation of Code from Deep Learning Research PapersAkshay Sethi, Anush Sankaran, Naveen Panwar et al.
With an abundance of research papers in deep learning, reproducibility or adoption of the existing works becomes a challenge. This is due to the lack of open source implementations provided by the authors. Further, re-implementing research papers in a different library is a daunting task. To address these challenges, we propose a novel extensible approach, DLPaper2Code, to extract and understand deep learning design flow diagrams and tables available in a research paper and convert them to an abstract computational graph. The extracted computational graph is then converted into execution ready source code in both Keras and Caffe, in real-time. An arXiv-like website is created where the automatically generated designs is made publicly available for 5,000 research papers. The generated designs could be rated and edited using an intuitive drag-and-drop UI framework in a crowdsourced manner. To evaluate our approach, we create a simulated dataset with over 216,000 valid design visualizations using a manually defined grammar. Experiments on the simulated dataset show that the proposed framework provide more than $93\%$ accuracy in flow diagram content extraction.
CLApr 10, 2025
AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel JailbreaksCharlotte Siska, Anush Sankaran
In the past few years, Language Models (LMs) have shown par-human capabilities in several domains. Despite their practical applications and exceeding user consumption, they are susceptible to jailbreaks when malicious input exploits the LM's weaknesses, causing it to deviate from its intended behavior. Current defensive strategies either classify the input prompt as adversarial or prevent LMs from generating harmful outputs. However, it is challenging to explain the reason behind the malicious nature of the jailbreak, which results in a wide variety of closed-box approaches. In this research, we propose and demonstrate that system-prompt attention from Small Language Models (SLMs) can be used to characterize adversarial prompts, providing a novel, explainable, and cheaper defense approach called AttentionDefense. Our research suggests that the attention mechanism is an integral component in understanding and explaining how LMs respond to malicious input that is not captured in the semantic meaning of text embeddings. The proposed AttentionDefense is evaluated against existing jailbreak benchmark datasets. Ablation studies show that SLM-based AttentionDefense has equivalent or better jailbreak detection performance compared to text embedding-based classifiers and GPT-4 zero-shot detectors.To further validate the efficacy of the proposed approach, we generate a dataset of novel jailbreak variants of the existing benchmark dataset using a closed-loop LLM-based multi-agent system. We demonstrate that the proposed AttentionDefense approach performs robustly on this novel jailbreak dataset while existing approaches suffer in performance. Additionally, for practical purposes AttentionDefense is an ideal solution as it has the computation requirements of a small LM but the performance of a LLM detector.
LGDec 19, 2021
On Causal Inference for Data-free Structured PruningMartin Ferianc, Anush Sankaran, Olivier Mastropietro et al.
Neural networks (NNs) are making a large impact both on research and industry. Nevertheless, as NNs' accuracy increases, it is followed by an expansion in their size, required number of compute operations and energy consumption. Increase in resource consumption results in NNs' reduced adoption rate and real-world deployment impracticality. Therefore, NNs need to be compressed to make them available to a wider audience and at the same time decrease their runtime costs. In this work, we approach this challenge from a causal inference perspective, and we propose a scoring mechanism to facilitate structured pruning of NNs. The approach is based on measuring mutual information under a maximum entropy perturbation, sequentially propagated through the NN. We demonstrate the method's performance on two datasets and various NNs' sizes, and we show that our approach achieves competitive performance under challenging conditions.
LGJan 11, 2021
Deeplite Neutrino: An End-to-End Framework for Constrained Deep Learning Model OptimizationAnush Sankaran, Olivier Mastropietro, Ehsan Saboori et al.
Designing deep learning-based solutions is becoming a race for training deeper models with a greater number of layers. While a large-size deeper model could provide competitive accuracy, it creates a lot of logistical challenges and unreasonable resource requirements during development and deployment. This has been one of the key reasons for deep learning models not being excessively used in various production environments, especially in edge devices. There is an immediate requirement for optimizing and compressing these deep learning models, to enable on-device intelligence. In this research, we introduce a black-box framework, Deeplite Neutrino for production-ready optimization of deep learning models. The framework provides an easy mechanism for the end-users to provide constraints such as a tolerable drop in accuracy or target size of the optimized models, to guide the whole optimization process. The framework is easy to include in an existing production pipeline and is available as a Python Package, supporting PyTorch and Tensorflow libraries. The optimization performance of the framework is shown across multiple benchmark datasets and popular deep learning models. Further, the framework is currently used in production and the results and testimonials from several clients are summarized.
CVMay 20, 2020
Reducing Overlearning through Disentangled Representations by Suppressing Unknown TasksNaveen Panwar, Tarun Tater, Anush Sankaran et al.
Existing deep learning approaches for learning visual features tend to overlearn and extract more information than what is required for the task at hand. From a privacy preservation perspective, the input visual information is not protected from the model; enabling the model to become more intelligent than it is trained to be. Current approaches for suppressing additional task learning assume the presence of ground truth labels for the tasks to be suppressed during training time. In this research, we propose a three-fold novel contribution: (i) a model-agnostic solution for reducing model overlearning by suppressing all the unknown tasks, (ii) a novel metric to measure the trust score of a trained deep learning model, and (iii) a simulated benchmark dataset, PreserveTask, having five different fundamental image classification tasks to study the generalization nature of models. In the first set of experiments, we learn disentangled representations and suppress overlearning of five popular deep learning models: VGG16, VGG19, Inception-v1, MobileNet, and DenseNet on PreserverTask dataset. Additionally, we show results of our framework on color-MNIST dataset and practical applications of face attribute preservation in Diversity in Faces (DiF) and IMDB-Wiki dataset.
LGNov 26, 2019
"You might also like this model": Data Driven Approach for Recommending Deep Learning Models for Unknown Image DatasetsAmeya Prabhu, Riddhiman Dasgupta, Anush Sankaran et al.
For an unknown (new) classification dataset, choosing an appropriate deep learning architecture is often a recursive, time-taking, and laborious process. In this research, we propose a novel technique to recommend a suitable architecture from a repository of known models. Further, we predict the performance accuracy of the recommended architecture on the given unknown dataset, without the need for training the model. We propose a model encoder approach to learn a fixed length representation of deep learning architectures along with its hyperparameters, in an unsupervised fashion. We manually curate a repository of image datasets with corresponding known deep learning models and show that the predicted accuracy is a good estimator of the actual accuracy. We discuss the implications of the proposed approach for three benchmark images datasets and also the challenges in using the approach for text modality. To further increase the reproducibility of the proposed approach, the entire implementation is made publicly available along with the trained models.
LGNov 17, 2019
Coverage Testing of Deep Learning Models using Dataset CharacterizationSenthil Mani, Anush Sankaran, Srikanth Tamilselvam et al.
Deep Neural Networks (DNNs), with its promising performance, are being increasingly used in safety critical applications such as autonomous driving, cancer detection, and secure authentication. With growing importance in deep learning, there is a requirement for a more standardized framework to evaluate and test deep learning models. The primary challenge involved in automated generation of extensive test cases are: (i) neural networks are difficult to interpret and debug and (ii) availability of human annotators to generate specialized test points. In this research, we explain the necessity to measure the quality of a dataset and propose a test case generation system guided by the dataset properties. From a testing perspective, four different dataset quality dimensions are proposed: (i) equivalence partitioning, (ii) centroid positioning, (iii) boundary conditioning, and (iv) pair-wise boundary conditioning. The proposed system is evaluated on well known image classification datasets such as MNIST, Fashion-MNIST, CIFAR10, CIFAR100, and SVHN against popular deep learning models such as LeNet, ResNet-20, VGG-19. Further, we conduct various experiments to demonstrate the effectiveness of systematic test case generation system for evaluating deep learning models.
HCMay 7, 2019
A Visual Programming Paradigm for Abstract Deep Learning Model DevelopmentSrikanth Tamilselvam, Naveen Panwar, Shreya Khare et al.
Deep learning is one of the fastest growing technologies in computer science with a plethora of applications. But this unprecedented growth has so far been limited to the consumption of deep learning experts. The primary challenge being a steep learning curve for learning the programming libraries and the lack of intuitive systems enabling non-experts to consume deep learning. Towards this goal, we study the effectiveness of a no-code paradigm for designing deep learning models. Particularly, a visual drag-and-drop interface is found more efficient when compared with the traditional programming and alternative visual programming paradigms. We conduct user studies of different expertise levels to measure the entry level barrier and the developer load across different programming paradigms. We obtain a System Usability Scale (SUS) of 90 and a NASA Task Load index (TLX) score of 21 for the proposed visual programming compared to 68 and 52, respectively, for the traditional programming methods.
CVNov 18, 2018
On Matching Faces with Alterations due to Plastic Surgery and DisguiseSaksham Suri, Anush Sankaran, Mayank Vatsa et al.
Plastic surgery and disguise variations are two of the most challenging co-variates of face recognition. The state-of-art deep learning models are not sufficiently successful due to the availability of limited training samples. In this paper, a novel framework is proposed which transfers fundamental visual features learnt from a generic image dataset to supplement a supervised face recognition model. The proposed algorithm combines off-the-shelf supervised classifier and a generic, task independent network which encodes information related to basic visual cues such as color, shape, and texture. Experiments are performed on IIITD plastic surgery face dataset and Disguised Faces in the Wild (DFW) dataset. Results showcase that the proposed algorithm achieves state of the art results on both the datasets. Specifically on the DFW database, the proposed algorithm yields over 87% verification accuracy at 1% false accept rate which is 53.8% better than baseline results computed using VGGFace.
LGNov 11, 2018
Explaining Deep Learning Models using Causal InferenceTanmayee Narendra, Anush Sankaran, Deepak Vijaykeerthy et al.
Although deep learning models have been successfully applied to a variety of tasks, due to the millions of parameters, they are becoming increasingly opaque and complex. In order to establish trust for their widespread commercial use, it is important to formalize a principled framework to reason over these models. In this work, we use ideas from causal inference to describe a general framework to reason over CNN models. Specifically, we build a Structural Causal Model (SCM) as an abstraction over a specific aspect of the CNN. We also formulate a method to quantitatively rank the filters of a convolution layer according to their counterfactual importance. We illustrate our approach with popular CNN architectures such as LeNet5, VGG19, and ResNet32.
CLJan 1, 2018
Sanskrit Sandhi Splitting using seq2(seq)^2Rahul Aralikatte, Neelamadhav Gantayat, Naveen Panwar et al.
In Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems incorporate these pre-defined splitting rules, they have a low accuracy as the same compound word might be broken down in multiple ways to provide syntactically correct splits. In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. Additionally, we show the generalization capability of our deep learning model, by showing competitive results in the problem of Chinese word segmentation, as well.
CLNov 2, 2017
Hi, how can I help you?: Automating enterprise IT support help desksSenthil Mani, Neelamadhav Gantayat, Rahul Aralikatte et al.
Question answering is one of the primary challenges of natural language understanding. In realizing such a system, providing complex long answers to questions is a challenging task as opposed to factoid answering as the former needs context disambiguation. The different methods explored in the literature can be broadly classified into three categories namely: 1) classification based, 2) knowledge graph based and 3) retrieval based. Individually, none of them address the need of an enterprise wide assistance system for an IT support and maintenance domain. In this domain the variance of answers is large ranging from factoid to structured operating procedures; the knowledge is present across heterogeneous data sources like application specific documentation, ticket management systems and any single technique for a general purpose assistance is unable to scale for such a landscape. To address this, we have built a cognitive platform with capabilities adopted for this domain. Further, we have built a general purpose question answering system leveraging the platform that can be instantiated for multiple products, technologies in the support domain. The system uses a novel hybrid answering model that orchestrates across a deep learning classifier, a knowledge graph based context disambiguation module and a sophisticated bag-of-words search system. This orchestration performs context switching for a provided question and also does a smooth hand-off of the question to a human expert if none of the automated techniques can provide a confident answer. This system has been deployed across 675 internal enterprise IT support and maintenance projects.
AISep 25, 2017
"Let me convince you to buy my product ... ": A Case Study of an Automated Persuasive System for Fashion ProductsVitobha Munigala, Srikanth Tamilselvam, Anush Sankaran
Persuasivenes is a creative art aimed at making people believe in certain set of beliefs. Many a times, such creativity is about adapting richness of one domain into another to strike a chord with the target audience. In this research, we present PersuAIDE! - A persuasive system based on linguistic creativity to transform given sentence to generate various forms of persuading sentences. These various forms cover multiple focus of persuasion such as memorability and sentiment. For a given simple product line, the algorithm is composed of several steps including: (i) select an appropriate well-known expression for the target domain to add memorability, (ii) identify keywords and entities in the given sentence and expression and transform it to produce creative persuading sentence, and (iii) adding positive or negative sentiment for further persuasion. The persuasive conversion were manually verified using qualitative results and the effectiveness of the proposed approach is empirically discussed.
CLAug 16, 2017
mAnI: Movie Amalgamation using Neural ImitationNaveen Panwar, Shreya Khare, Neelamadhav Gantayat et al.
Cross-modal data retrieval has been the basis of various creative tasks performed by Artificial Intelligence (AI). One such highly challenging task for AI is to convert a book into its corresponding movie, which most of the creative film makers do as of today. In this research, we take the first step towards it by visualizing the content of a book using its corresponding movie visuals. Given a set of sentences from a book or even a fan-fiction written in the same universe, we employ deep learning models to visualize the input by stitching together relevant frames from the movie. We studied and compared three different types of setting to match the book with the movie content: (i) Dialog model: using only the dialog from the movie, (ii) Visual model: using only the visual content from the movie, and (iii) Hybrid model: using the dialog and the visual content from the movie. Experiments on the publicly available MovieBook dataset shows the effectiveness of the proposed models.
SEAug 16, 2017
DARVIZ: Deep Abstract Representation, Visualization, and Verification of Deep Learning ModelsAnush Sankaran, Rahul Aralikatte, Senthil Mani et al.
Traditional software engineering programming paradigms are mostly object or procedure oriented, driven by deterministic algorithms. With the advent of deep learning and cognitive sciences there is an emerging trend for data-driven programming, creating a shift in the programming paradigm among the software engineering communities. Visualizing and interpreting the execution of a current large scale data-driven software development is challenging. Further, for deep learning development there are many libraries in multiple programming languages such as TensorFlow (Python), CAFFE (C++), Theano (Python), Torch (Lua), and Deeplearning4j (Java), driving a huge need for interoperability across libraries.