Sadia Afroz

CR
9papers
1,374citations
Novelty39%
AI Score40

9 Papers

48.3SEApr 5
The Fast and Spurious: Developer Productivity with GenAI

Sadia Afroz, Zixuan Feng, Tyler Menezes et al.

Generative AI (GenAI) tools are increasingly being adopted in software development as productivity aids, since there is evidence that GenAI tools can improve individual aspects of productivity. However, productivity is multidimensional; accelerating one aspect of work may simply shift effort to another. In this paper, we investigate how GenAI adoption affects different dimensions of developer productivity. We surveyed 415 software practitioners to understand how they perceive productivity changes associated with AI adoption, using the SPACE framework (Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow). Our results reveal systematic redistribution of effort across SPACE dimensions. While frequent GenAI users reported faster task completion and higher output volume, these gains were offset by increased code review burden, persistent cognitive load from output verification, and unchanged collaboration patterns. We further provide an empirical mapping between the challenges perceived by developers and potential strategies to mitigate them. Overall, our findings suggest that, at the current stage of GenAI adoption, perceived productivity gains may be spurious -- surface-level acceleration, often accompanied by redistributed effort and hidden costs.

CRNov 28, 2021
MALIGN: Explainable Static Raw-byte Based Malware Family Classification using Sequence Alignment

Shoumik Saha, Sadia Afroz, Atif Rahman

For a long time, malware classification and analysis have been an arms-race between antivirus systems and malware authors. Though static analysis is vulnerable to evasion techniques, it is still popular as the first line of defense in antivirus systems. But most of the static analyzers failed to gain the trust of practitioners due to their black-box nature. We propose MAlign, a novel static malware family classification approach inspired by genome sequence alignment that can not only classify malware families but can also provide explanations for its decision. MAlign encodes raw bytes using nucleotides and adopts genome sequence alignment approaches to create a signature of a malware family based on the conserved code segments in that family, without any human labor or expertise. We evaluate MAlign on two malware datasets, and it outperforms other state-of-the-art machine learning based malware classifiers (by 4.49% - 0.07%), especially on small datasets (by 19.48% - 1.2%). Furthermore, we explain the generated signatures by MAlign on different malware families illustrating the kinds of insights it can provide to analysts, and show its efficacy as an analysis tool. Additionally, we evaluate its theoretical and empirical robustness against some common attacks. In this paper, we approach static malware analysis from a unique perspective, aiming to strike a delicate balance among performance, interpretability, and robustness.

SEMar 21, 2021
An Empirical Study of Developer Discussions on Low-Code Software Development Challenges

Md Abdullah Al Alamin, Sanjay Malakar, Gias Uddin et al.

Low-code software development (LCSD) is an emerging paradigm that combines minimal source code with interactive graphical interfaces to promote rapid application development. LCSD aims to democratize application development to software practitioners with diverse backgrounds. Given that LCSD is relatively a new paradigm, it is vital to learn about the challenges developers face during their adoption of LCSD platforms. The online developer forum, Stack Overflow (SO), is popular among software developers to ask for solutions to their technical problems. We observe a growing body of posts in SO with discussions of LCSD platforms. In this paper, we present an empirical study of around 5K SO posts (questions + accepted answers) that contain discussions of nine popular LCSD platforms. We apply topic modeling on the posts to determine the types of topics discussed. We find 13 topics related to LCSD in SO. The 13 topics are grouped into four categories: Customization, Platform Adoption, Database Management, and Third-Party Integration. More than 40% of the questions are about customization, i.e., developers frequently face challenges with customizing user interfaces or services offered by LCSD platforms. The topic "Dynamic Event Handling" under the "Customization" category is the most popular (in terms of average view counts per question of the topic) as well as the most difficult. It means that developers frequently search for customization solutions such as how to attach dynamic events to a form in low-code UI, yet most (75.9%) of their questions remain without an accepted answer. We manually label 900 questions from the posts to determine the prevalence of the topics' challenges across LCSD phases. We find that most of the questions are related to the development phase, and low-code developers also face challenges with automated testing.

CRMar 6, 2020
MAB-Malware: A Reinforcement Learning Framework for Attacking Static Malware Classifiers

Wei Song, Xuezixiang Li, Sadia Afroz et al.

Modern commercial antivirus systems increasingly rely on machine learning to keep up with the rampant inflation of new malware. However, it is well-known that machine learning models are vulnerable to adversarial examples (AEs). Previous works have shown that ML malware classifiers are fragile to the white-box adversarial attacks. However, ML models used in commercial antivirus products are usually not available to attackers and only return hard classification labels. Therefore, it is more practical to evaluate the robustness of ML models and real-world AVs in a pure black-box manner. We propose a black-box Reinforcement Learning (RL) based framework to generate AEs for PE malware classifiers and AV engines. It regards the adversarial attack problem as a multi-armed bandit problem, which finds an optimal balance between exploiting the successful patterns and exploring more varieties. Compared to other frameworks, our improvements lie in three points. 1) Limiting the exploration space by modeling the generation process as a stateless process to avoid combination explosions. 2) Due to the critical role of payload in AE generation, we design to reuse the successful payload in modeling. 3) Minimizing the changes on AE samples to correctly assign the rewards in RL learning. It also helps identify the root cause of evasions. As a result, our framework has much higher black-box evasion rates than other off-the-shelf frameworks. Results show it has over 74\%--97\% evasion rate for two state-of-the-art ML detectors and over 32\%--48\% evasion rate for commercial AVs in a pure black-box setting. We also demonstrate that the transferability of adversarial attacks among ML-based classifiers is higher than the attack transferability between purely ML-based and commercial AVs.

CLMay 12, 2019
A Benchmark Study of Machine Learning Models for Online Fake News Detection

Junaed Younus Khan, Md. Tawkat Islam Khondaker, Sadia Afroz et al.

The proliferation of fake news and its propagation on social media has become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been suggested to detect fake news. However, most of those focused on a specific type of news (such as political) which leads us to the question of dataset-bias of the models used. In this research, we conducted a benchmark study to assess the performance of different applicable machine learning approaches on three different datasets where we accumulated the largest and most diversified one. We explored a number of advanced pre-trained language models for fake news detection along with the traditional and deep learning ones and compared their performances from different aspects for the first time to the best of our knowledge. We find that BERT and similar pre-trained models perform the best for fake news detection, especially with very small dataset. Hence, these models are significantly better option for languages with limited electronic contents, i.e., training data. We also carried out several analysis based on the models' performance, article's topic, article's length, and discussed different lessons learned from them. We believe that this benchmark study will help the research community to explore further and news sites/blogs to select the most appropriate fake news detection method.

CRDec 2, 2018
Towards Automatic Discovery of Cybercrime Supply Chains

Rasika Bhalerao, Maxwell Aliapoulios, Ilia Shumailov et al.

Cybercrime forums enable modern criminal entrepreneurs to collaborate with other criminals into increasingly efficient and sophisticated criminal endeavors. Understanding the connections between different products and services can often illuminate effective interventions. However, generating this understanding of supply chains currently requires time-consuming manual effort. In this paper, we propose a language-agnostic method to automatically extract supply chains from cybercrime forum posts and replies. Our supply chain detection algorithm can identify 36% and 58% relevant chains within major English and Russian forums, respectively, showing improvements over the baselines of 13% and 36%, respectively. Our analysis of the automatically generated supply chains demonstrates underlying connections between products and services within these forums. For example, the extracted supply chain illuminated the connection between hack-for-hire services and the selling of rare and valuable `OG' accounts, which has only recently been reported. The understanding of connections between products and services exposes potentially effective intervention points.

CLAug 31, 2017
Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

Greg Durrett, Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick et al.

One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate data from four different forums. Each of these forums constitutes its own "fine-grained domain" in that the forums cover different market sectors with different properties, even though all forums are in the broad domain of cybercrime. We characterize these domain differences in the context of a learning-based system: supervised models see decreased accuracy when applied to new forums, and standard techniques for semi-supervised learning and domain adaptation have limited effectiveness on this data, which suggests the need to improve these techniques. We release a dataset of 1,938 annotated posts from across the four forums.

CROct 26, 2015
Reviewer Integration and Performance Measurement for Malware Detection

Brad Miller, Alex Kantchelian, Michael Carl Tschantz et al.

We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system's ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 percentage points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3% of our entire dataset.

CRSep 10, 2014
On Modeling the Costs of Censorship

Michael Carl Tschantz, Sadia Afroz, Vern Paxson et al.

We argue that the evaluation of censorship evasion tools should depend upon economic models of censorship. We illustrate our position with a simple model of the costs of censorship. We show how this model makes suggestions for how to evade censorship. In particular, from it, we develop evaluation criteria. We examine how our criteria compare to the traditional methods of evaluation employed in prior works.