Ahmadreza Saboor Yaraghi

9.9SEApr 10

Efficient Black-Box Fault Localization for System-Level Test Code Using Large Language Models

Ahmadreza Saboor Yaraghi, Golnaz Gharachorlu, Sakina Fatima et al.

Fault localization (FL) is a critical step in debugging, which typically relies on repeated executions to pinpoint faulty code regions. However, repeated executions can be impractical in the presence of non-deterministic failures or high execution costs. While recent efforts have leveraged Large Language Models (LLMs) to aid execution-free FL, these have primarily focused on identifying faults in the system-under-test (SUT) rather than in the often complex system-level test code. However, the latter is also important, as in practice, many failures are triggered by faulty test code. To overcome these challenges, we introduce a fully static, LLM-driven approach for system-level test code fault localization (TCFL) that does not require executing the test case. Our method uses a single failure execution log to estimate the test's execution trace through three novel algorithms that identify only code statements likely involved in the failure. This pruned trace, combined with the error message, is used to prompt the LLM to rank potential faulty locations. Our black-box, system-level approach requires no access to the SUT source code and is applicable to complex test scripts that assess full system behavior. We evaluate our technique at the function, block, and line levels using an industrial dataset of faulty Python test cases that were not used in pre-training LLMs. Results show that our best-estimated traces closely match the actual traces, with an F1 score of around 90%. Additionally, pruning the complex system-level test code reduces the LLM's inference time by up to 34% without any loss in FL performance. Our method achieves equal or higher FL accuracy, requiring over 85% less average inference time per test case and 93% fewer tokens than the latest LLM-guided FL method.

SESep 27, 2021Code

Scalable and Accurate Test Case Prioritization in Continuous Integration Contexts

Ahmadreza Saboor Yaraghi, Mojtaba Bagherzadeh, Nafiseh Kahani et al.

Continuous Integration (CI) requires efficient regression testing to ensure software quality without significantly delaying its CI builds. This warrants the need for techniques to reduce regression testing time, such as Test Case Prioritization (TCP) techniques that prioritize the execution of test cases to detect faults as early as possible. Many recent TCP studies employ various Machine Learning (ML) techniques to deal with the dynamic and complex nature of CI. However, most of them use a limited number of features for training ML models and evaluate the models on subjects for which the application of TCP makes little practical sense, due to their small regression testing time and low number of failed builds. In this work, we first define, at a conceptual level, a data model that captures data sources and their relations in a typical CI environment. Second, based on this data model, we define a comprehensive set of features that covers all features previously used by related studies. Third, we develop methods and tools to collect the defined features for 25 open-source software systems with enough failed builds and whose regression testing takes at least five minutes. Fourth, relying on the collected dataset containing a comprehensive feature set, we answer four research questions concerning data collection time, the effectiveness of ML-based TCP, the impact of the features on effectiveness, the decay of ML-based TCP models over time, and the trade-off between data collection time and the effectiveness of ML-based TCP techniques.

Ahmadreza Saboor Yaraghi

2 Papers