Aftab Hussain

SE
h-index23
14papers
232citations
Novelty28%
AI Score24

14 Papers

SEMay 28, 2022
Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour

Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI models. However, this approach is syntax-unaware and does not consider the grammar of the programming language. In this paper, we apply a syntax-guided program reduction technique that considers the grammar of the input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.

SEJul 6, 2024
Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Rabimba Karanjai, Aftab Hussain, Md Rafiqul Islam Rabin et al.

Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective. To address this, we propose an automated method for generating unit tests for such software, considering their unique features like complex logic and parallel processing. Recently, large language models (LLMs) have shown promise in coding and testing. We explored the capabilities of Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo) in creating unit tests for C++ parallel programs. Our results show that LLMs can generate mostly correct and comprehensive unit tests, although they have some limitations, such as repetitive assertions and blank test cases.

LGMar 8, 2023
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code

Aftab Hussain, Md Rafiqul Islam Rabin, Bowen Xu et al.

Although deep neural models substantially reduce the overhead of feature engineering, the features readily available in the inputs might significantly impact training cost and the performance of the models. In this paper, we explore the impact of an unsuperivsed feature enrichment approach based on variable roles on the performance of neural models of code. The notion of variable roles (as introduced in the works of Sajaniemi et al. [Refs. 1,2]) has been found to help students' abilities in programming. In this paper, we investigate if this notion would improve the performance of neural models of code. To the best of our knowledge, this is the first work to investigate how Sajaniemi et al.'s concept of variable roles can affect neural models of code. In particular, we enrich a source code dataset by adding the role of individual variables in the dataset programs, and thereby conduct a study on the impact of variable role enrichment in training the Code2Seq model. In addition, we shed light on some challenges and opportunities in feature enrichment for neural code intelligence models.

CVSep 4, 2024
SITAR: Semi-supervised Image Transformer for Action Recognition

Owais Iqbal, Omprakash Chakraborty, Aftab Hussain et al.

Recognizing actions from a limited set of labeled videos remains a challenge as annotating visual data is not only tedious but also can be expensive due to classified nature. Moreover, handling spatio-temporal data using deep $3$D transformers for this can introduce significant computational complexity. In this paper, our objective is to address video action recognition in a semi-supervised setting by leveraging only a handful of labeled videos along with a collection of unlabeled videos in a compute efficient manner. Specifically, we rearrange multiple frames from the input videos in row-column form to construct super images. Subsequently, we capitalize on the vast pool of unlabeled samples and employ contrastive learning on the encoded super images. Our proposed approach employs two pathways to generate representations for temporally augmented super images originating from the same video. Specifically, we utilize a 2D image-transformer to generate representations and apply a contrastive loss function to minimize the similarity between representations from different videos while maximizing the representations of identical videos. Our method demonstrates superior performance compared to existing state-of-the-art approaches for semi-supervised action recognition across various benchmark datasets, all while significantly reducing computational costs.

LGMar 3, 2023
Study of Distractors in Neural Models of Code

Md Rafiqul Islam Rabin, Aftab Hussain, Sahil Suneja et al.

Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction. Understanding distractors provide a complementary view of the features' relevance in the predictions of neural models. In this paper, we apply a reduction-based technique to find distractors and provide our preliminary results of their impacts and types. Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions and the categories of tokens can also play a vital role in the model's confidence. Our study aims to enhance the transparency of models by emphasizing those tokens that significantly influence the confidence of the models.

SEAug 22, 2024
Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code

Mahdi Kazemi, Aftab Hussain, Md Rafiqul Islam Rabin et al.

This work investigates the application of Machine Unlearning (MU) for mitigating the impact of trojans embedded in conventional large language models of natural language (Text-LLMs) and large language models of code (Code-LLMs) We propose a novel unlearning approach, LYA, that leverages both gradient ascent and elastic weight consolidation, a Fisher Information Matrix (FIM) based regularization technique, to unlearn trojans from poisoned models. We compare the effectiveness of LYA against conventional techniques like fine-tuning, retraining, and vanilla gradient ascent. The subject models we investigate are BERT and CodeBERT, for sentiment analysis and code defect detection tasks, respectively. Our findings demonstrate that the combination of gradient ascent and FIM-based regularization, as done in LYA, outperforms existing methods in removing the trojan's influence from the poisoned model, while preserving its original functionality. To the best of our knowledge, this is the first work that compares and contrasts MU of trojans in LLMs, in the NL and Coding domain.

SEDec 25, 2021Code
FMViz: Visualizing Tests Generated by AFL at the Byte-level

Aftab Hussain, Mohammad Amin Alipour

Software fuzzing is a strong testing technique that has become the de facto approach for automated software testing and software vulnerability detection in the industry. The random nature of fuzzing makes monitoring and understanding the behavior of fuzzers difficult. In this paper, we report the development of Fuzzer Mutation Visualizer (FMViz), a tool that focuses on visualizing byte-level mutations in fuzzers. In particular, FMViz extends American Fuzzy Lop (AFL) to visualize the generated test inputs and highlight changes between consecutively generated seeds as a fuzzing campaign progresses. The overarching goal of our tool is to help developers and students comprehend the inner-workings of the AFL fuzzer better. In this paper, we present the architecture of FMViz, discuss a sample case study of it, and outline the future work. FMViz is open-source and publicly available at https://github.com/AftabHussain/afl-test-viz.

SEMay 5, 2024
Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed et al.

Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents an overview of the current state-of-the-art trojan attacks on large language models of code, with a focus on triggers -- the main design point of trojans -- with the aid of a novel unifying trigger taxonomy framework. We also aim to provide a uniform definition of the fundamental concepts in the area of trojans in Code LLMs. Finally, we draw implications of findings on how code models learn on trigger design.

CRFeb 23, 2024
On Trojan Signatures in Large Language Models of Code

Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, and VGG. In this paper, we investigate such signatures in the classifier layer parameters of large language models of source code. Our results suggest that trojan signatures could not generalize to LLMs of code. We found that trojaned code models are stubborn, even when the models were poisoned under more explicit settings (finetuned with pre-trained weights frozen). We analyzed nine trojaned models for two binary classification tasks: clone and defect detection. To the best of our knowledge, this is the first work to examine weight-based trojan signature revelation techniques for large-language models of code and furthermore to demonstrate that detecting trojans only from the weights in such models is a hard problem.

CLJan 3, 2022
Testing the Robustness of a BiLSTM-based Structural Story Classifier

Aftab Hussain, Sai Durga Prasad Nanduri, Sneha Seenuvasavarathan

The growing prevalence of counterfeit stories on the internet has fostered significant interest towards fast and scalable detection of fake news in the machine learning community. While several machine learning techniques for this purpose have emerged, we observe that there is a need to evaluate the impact of noise on these techniques' performance, where noise constitutes news articles being mistakenly labeled as fake (or real). This work takes a step in that direction, where we examine the impact of noise on a state-of-the-art, structural model based on BiLSTM (Bidirectional Long-Short Term Model) for fake news detection, Hierarchical Discourse-level Structure for Fake News Detection by Karimi and Tang (Reference no. 9).

SEDec 25, 2021
DIAR: Removing Uninteresting Bytes from Seeds in Software Fuzzing

Aftab Hussain, Mohammad Amin Alipour

Software fuzzing mutates bytes in the test seeds to explore different behaviors of the program under test. Initial seeds can have great impact on the performance of a fuzzing campaign. Mutating a lot of uninteresting bytes in a large seed wastes the fuzzing resources. In this paper, we present the preliminary results of our approach that aims to improve the performance of fuzzers through identifying and removing uninteresting bytes in the seeds. In particular, we present DIAR, a technique that reduces the size of the seeds based on their coverage. Our preliminary results suggest fuzzing campaigns that start with reduced seeds, find new paths faster, and can produce higher coverage overall.

LGJun 16, 2021
Memorization and Generalization in Neural Code Intelligence Models

Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour et al.

Deep Neural Networks (DNNs) are increasingly being used in software engineering and code intelligence tasks. These are powerful tools that are capable of learning highly generalizable patterns from large datasets through millions of parameters. At the same time, their large capacity can render them prone to memorizing data points. Recent work suggests that the memorization risk manifests especially strongly when the training dataset is noisy, involving many ambiguous or questionable samples, and memorization is the only recourse. The goal of this paper is to evaluate and compare the extent of memorization and generalization in neural code intelligence models. It aims to provide insights on how memorization may impact the learning behavior of neural models in code intelligence systems. To observe the extent of memorization in models, we add random noise to the original training dataset and use various metrics to quantify the impact of noise on various aspects of training and testing. We evaluate several state-of-the-art neural code intelligence models and benchmarks based on Java, Python, and Ruby codebases. Our results highlight important risks: millions of trainable parameters allow the neural networks to memorize anything, including noisy data, and provide a false sense of generalization. We observed all models manifest some forms of memorization. This can be potentially troublesome in most code intelligence tasks where they rely on rather noise-prone and repetitive data sources, such as code from GitHub. To the best of our knowledge, we provide the first study to quantify memorization effects in the domain of software engineering and code intelligence systems. This work raises awareness and provides new insights into important issues of training neural models in code intelligence systems that are usually overlooked by software engineering researchers.

SENov 8, 2018
A holistic look at requirements engineering practices in the gaming industry

Aftab Hussain, Omar Asadi, Debra J. Richardson

In this work we present an account of the status of requirements engineering in the gaming industry. Recent papers in the area were surveyed. Characterizations of the gaming industry were deliberated upon by portraying its relations with the market industry. Some research directions in the area of requirements engineering in the gaming industry were also mentioned.

SEMay 14, 2016
From Query to Usable Code: An Analysis of Stack Overflow Code Snippets

Di Yang, Aftab Hussain, Cristina Lopes

Enriched by natural language texts, Stack Overflow code snippets are an invaluable code-centric knowledge base of small units of source code. Besides being useful for software developers, these annotated snippets can potentially serve as the basis for automated tools that provide working code solutions to specific natural language queries. With the goal of developing automated tools with the Stack Overflow snippets and surrounding text, this paper investigates the following questions: (1) How usable are the Stack Overflow code snippets? and (2) When using text search engines for matching on the natural language questions and answers around the snippets, what percentage of the top results contain usable code snippets? A total of 3M code snippets are analyzed across four languages: C\#, Java, JavaScript, and Python. Python and JavaScript proved to be the languages for which the most code snippets are usable. Conversely, Java and C\# proved to be the languages with the lowest usability rate. Further qualitative analysis on usable Python snippets shows the characteristics of the answers that solve the original question. Finally, we use Google search to investigate the alignment of usability and the natural language annotations around code snippets, and explore how to make snippets in Stack Overflow an adequate base for future automatic program generation.