CLAug 15, 2024
Evaluating Text Classification Robustness to Part-of-Speech Adversarial ExamplesAnahita Samadi, Allison Sullivan
As machine learning systems become more widely used, especially for safety critical applications, there is a growing need to ensure that these systems behave as intended, even in the face of adversarial examples. Adversarial examples are inputs that are designed to trick the decision making process, and are intended to be imperceptible to humans. However, for text-based classification systems, changes to the input, a string of text, are always perceptible. Therefore, text-based adversarial examples instead focus on trying to preserve semantics. Unfortunately, recent work has shown this goal is often not met. To improve the quality of text-based adversarial examples, we need to know what elements of the input text are worth focusing on. To address this, in this paper, we explore what parts of speech have the highest impact of text-based classifiers. Our experiments highlight a distinct bias in CNN algorithms against certain parts of speech tokens within review datasets. This finding underscores a critical vulnerability in the linguistic processing capabilities of CNNs.
SEDec 7, 2023
LLM4TDD: Best Practices for Test Driven Development Using Large Language ModelsSanyogita Piya, Allison Sullivan
In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating the program given an outline of the expected behavior. For decades, program synthesis has been an active research field, with recent approaches looking to incorporate Large Language Models to help generate code. This paper explores the concept of LLM4TDD, where we guide Large Language Models to generate code iteratively using a test-driven development methodology. We conduct an empirical evaluation using ChatGPT and coding problems from LeetCode to investigate the impact of different test, prompt and problem attributes on the efficacy of LLM4TDD.
SEOct 22, 2021
REACH: Refining Alloy Scenarios by ScopeAna Jovanovic, Allison Sullivan
Writing declarative models has numerous benefits, ranging from automated reasoning and correction of design-level properties be-fore systems are built, to automated testing and debugging of their implementations after they are built. Alloy is a declarative modeling language that is well suited for verifying system designs. A key strength of Alloy is its scenario-finding toolset, the Analyzer, which allows users to explore all valid scenarios that adhere to the model's constraints up to a user-provided scope. In Alloy, it is common for users to desire to first validate smaller scenarios, then once confident, move onto validating larger scenarios. However, the Analyzer only presents scenarios in the order they are discovered by the SAT solver. This paper presents Reach, an extension to the Analyzer which allows users to explore scenarios by size. Experimental results reveal Reach's enumeration improves performance while having the added benefit of maintaining a semi-sorted ordering of scenarios for the user. Moreover, we highlight Reach's ability to improve the performance of Alloy's analysis when the user makes incremental changes to the scope of the enumeration.
SEJul 23, 2018
Fault Localization for Declarative Models in AlloyKaiyuan Wang, Allison Sullivan, Darko Marinov et al.
Fault localization is a popular research topic and many techniques have been proposed to locate faults in imperative code, e.g. C and Java. In this paper, we focus on the problem of fault localization for declarative models in Alloy -- a first order relational logic with transitive closure. We introduce AlloyFL, the first set of fault localization techniques for faulty Alloy models which leverages multiple test formulas. AlloyFL is also the first set of fault localization techniques at the AST node granularity. We implements in AlloyFL both spectrum-based and mutation-based fault localization techniques, as well as techniques that are based on Alloy's built-in unsat core. We introduce new metrics to measure the accuracy of AlloyFL and systematically evaluate AlloyFL on 38 real faulty models and 9000 mutant models. The results show that the mutation-based fault localization techniques are significantly more accurate than other types of techniques.