SEFeb 26, 2025Code
IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic LanguagesUjjwal Singh, Aditi Sharma, Nikhil Gupta et al.
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation from natural language prompts, revolutionizing software development workflows. As we advance towards agent-based development paradigms, these models form the cornerstone of next-generation software development lifecycles. However, current benchmarks for evaluating multilingual code generation capabilities are predominantly English-centric, limiting their applicability across the global developer community. To address this limitation, we present IndicEval-XL, a comprehensive benchmark for code generation that incorporates 6 major Indic languages, collectively spoken by approximately 14\% of the world's population. Our benchmark bridges these languages with 12 programming languages, creating a robust evaluation framework. This work is particularly significant given India's representation of one-eighth of the global population and the crucial role Indic languages play in Indian society. IndicEval-XL represents a significant step toward expanding the linguistic diversity in code generation systems and evaluation frameworks. By developing resources that support multiple languages, we aim to make AI-powered development tools more inclusive and accessible to developers of various linguistic backgrounds. To facilitate further research and development in this direction, we make our dataset and evaluation benchmark publicly available at https://github.com/telekom/IndicEval-XL
CRNov 24, 2021
Needle in a Haystack: Detecting Subtle Malicious Edits to Additive Manufacturing G-code FilesCaleb Beckwith, Harsh Sankar Naicker, Svara Mehta et al.
Increasing usage of Digital Manufacturing (DM) in safety-critical domains is increasing attention on the cybersecurity of the manufacturing process, as malicious third parties might aim to introduce defects in digital designs. In general, the DM process involves creating a digital object (as CAD files) before using a slicer program to convert the models into printing instructions (e.g. g-code) suitable for the target printer. As the g-code is an intermediate machine format, malicious edits may be difficult to detect, especially when the golden (original) models are not available to the manufacturer. In this work we aim to quantify this hypothesis through a red-team/blue-team case study, whereby the red-team aims to introduce subtle defects that would impact the properties (strengths) of the 3D printed parts, and the blue-team aims to detect these modifications in the absence of the golden models. The case study had two sets of models, the first with 180 designs (with 2 compromised using 2 methods) and the second with 4320 designs (with 60 compromised using 6 methods). Using statistical modelling and machine learning (ML), the blue-team was able to detect all the compromises in the first set of data, and 50 of the compromises in the second.
CRApr 19, 2021
FLAW3D: A Trojan-based Cyber Attack on the Physical Outcomes of Additive ManufacturingHammond Pearce, Kaushik Yanamandra, Nikhil Gupta et al.
Additive Manufacturing (AM) systems such as 3D printers use inexpensive microcontrollers that rarely feature cybersecurity defenses. This is a risk, especially given the rising threat landscape within the larger digital manufacturing domain. In this work we demonstrate this risk by presenting the design and study of a malicious Trojan (the FLAW3D bootloader) for AVR-based Marlin-compatible 3D printers (>100 commercial models). We show that the Trojan can hide from programming tools, and even within tight design constraints (less than 1.7 kilobytes in size), it can compromise the quality of additively manufactured prints and reduce tensile strengths by up to 50%.
IRJul 14, 2020
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research DatasetEdwin Zhang, Nikhil Gupta, Raphael Tang et al.
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the ongoing TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the highest-scoring runs in rounds 1, 2, and 3. In round 3, we report the highest-scoring run that takes advantage of previous training data and the second-highest fully automatic run.
CRJun 9, 2020
A Survey of Cybersecurity of Digital ManufacturingPriyanka Mahesh, Akash Tiwari, Chenglu Jin et al.
The Industry 4.0 concept promotes a digital manufacturing (DM) paradigm that can enhance quality and productivity, that reduces inventory and the lead-time for delivering custom, batch-of-one products based on achieving convergence of Additive, Subtractive, and Hybrid manufacturing machines, Automation and Robotic Systems, Sensors, Computing, and Communication Networks, Artificial Intelligence, and Big Data. A DM system consists of embedded electronics, sensors, actuators, control software, and inter-connectivity to enable the machines and the components within them to exchange data with other machines, components therein, the plant operators, the inventory managers, and customers. This paper presents the cybersecurity risks in the emerging DM context, assesses the impact on manufacturing, and identifies approaches to secure DM.
CRMay 9, 2020
HACK3D: Crowdsourcing the Assessment of Cybersecurity in Digital ManufacturingMichael Linares, Nishant Aswani, Gary Mac et al.
Digital manufacturing (DM) cyber-physical system is vulnerable to both cyber and physical attacks. HACK3D is a series of crowdsourcing red-team-blue-team events hosted by the NYU Center for Cybersecurity to assess the strength of the security methods embedded in designs using DM. This study summarizes the lessons learned from the past three offerings of HACK3D, including ingenious ways in which skilled engineers can launch surprising attacks on DM designs not anticipated before. A key outcome is a taxonomy-guided creation of DM security benchmarks for use by the DM community.
LGApr 30, 2020
Unsupervised Learning of KB Queries in Task-Oriented DialogsDinesh Raghu, Nikhil Gupta, Mausam
Task-oriented dialog (TOD) systems often need to formulate knowledge base (KB) queries corresponding to the user intent and use the query results to generate system responses. Existing approaches require dialog datasets to explicitly annotate these KB queries -- these annotations can be time consuming, and expensive. In response, we define the novel problems of predicting the KB query and training the dialog agent, without explicit KB query annotation. For query prediction, we propose a reinforcement learning (RL) baseline, which rewards the generation of those queries whose KB results cover the entities mentioned in subsequent dialog. Further analysis reveals that correlation among query attributes in KB can significantly confuse memory augmented policy optimization (MAPO), an existing state of the art RL agent. To address this, we improve the MAPO baseline with simple but important modifications suited to our task. To train the full TOD system for our setting, we propose a pipelined approach: it independently predicts when to make a KB query (query position predictor), then predicts a KB query at the predicted position (query predictor), and uses the results of predicted query in subsequent dialog (next response predictor). Overall, our work proposes first solutions to our novel problem, and our analysis highlights the research challenges in training TOD systems without query annotation.
CLApr 23, 2020
Rapidly Bootstrapping a Question Answering Dataset for COVID-19Raphael Tang, Rodrigo Nogueira, Edwin Zhang et al.
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at http://covidqa.ai/
CLApr 10, 2020
Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons LearnedEdwin Zhang, Nikhil Gupta, Rodrigo Nogueira et al.
We present the Neural Covidex, a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. This web application exists as part of a suite of tools that we have developed over the past few weeks to help domain experts tackle the ongoing global pandemic. We hope that improved information access capabilities to the scientific literature can inform evidence-based decision making and insight generation. This paper describes our initial efforts and offers a few thoughts about lessons we have learned along the way.
LGMay 3, 2018
Disentangling Language and Knowledge in Task-Oriented DialogsDinesh Raghu, Nikhil Gupta, Mausam
The Knowledge Base (KB) used for real-world applications, such as booking a movie or restaurant reservation, keeps changing over time. End-to-end neural networks trained for these task-oriented dialogs are expected to be immune to any changes in the KB. However, existing approaches breakdown when asked to handle such changes. We propose an encoder-decoder architecture (BoSsNet) with a novel Bag-of-Sequences (BoSs) memory, which facilitates the disentangled learning of the response's language model and its knowledge incorporation. Consequently, the KB can be modified with new knowledge without a drop in interpretability. We find that BoSsNet outperforms state-of-the-art models, with considerable improvements (> 10\%) on bAbI OOV test sets and other human-human datasets. We also systematically modify existing datasets to measure disentanglement and show BoSsNet to be robust to KB modifications.
MLApr 13, 2018
A Grid Based Adversarial Clustering AlgorithmWutao Wei, Nikhil Gupta, Bowei Xi
Nowadays more and more data are gathered for detecting and preventing cyber attacks. In cyber security applications, data analytics techniques have to deal with active adversaries that try to deceive the data analytics models and avoid being detected. The existence of such adversarial behavior motivates the development of robust and resilient adversarial learning techniques for various tasks. Most of the previous work focused on adversarial classification techniques, which assumed the existence of a reasonably large amount of carefully labeled data instances. However, in practice, labeling the data instances often requires costly and time-consuming human expertise and becomes a significant bottleneck. Meanwhile, a large number of unlabeled instances can also be used to understand the adversaries' behavior. To address the above mentioned challenges, in this paper, we develop a novel grid based adversarial clustering algorithm. Our adversarial clustering algorithm is able to identify the core normal regions, and to draw defensive walls around the centers of the normal objects utilizing game theoretic ideas. Our algorithm also identifies sub-clusters of attack objects, the overlapping areas within clusters, and outliers which may be potential anomalies.