Ruifeng Gao

SE
h-index10
4papers
7citations
Novelty51%
AI Score43

4 Papers

46.0SEApr 10
Efficient Black-Box Fault Localization for System-Level Test Code Using Large Language Models

Ahmadreza Saboor Yaraghi, Golnaz Gharachorlu, Sakina Fatima et al.

Fault localization (FL) is a critical step in debugging, which typically relies on repeated executions to pinpoint faulty code regions. However, repeated executions can be impractical in the presence of non-deterministic failures or high execution costs. While recent efforts have leveraged Large Language Models (LLMs) to aid execution-free FL, these have primarily focused on identifying faults in the system-under-test (SUT) rather than in the often complex system-level test code. However, the latter is also important, as in practice, many failures are triggered by faulty test code. To overcome these challenges, we introduce a fully static, LLM-driven approach for system-level test code fault localization (TCFL) that does not require executing the test case. Our method uses a single failure execution log to estimate the test's execution trace through three novel algorithms that identify only code statements likely involved in the failure. This pruned trace, combined with the error message, is used to prompt the LLM to rank potential faulty locations. Our black-box, system-level approach requires no access to the SUT source code and is applicable to complex test scripts that assess full system behavior. We evaluate our technique at the function, block, and line levels using an industrial dataset of faulty Python test cases that were not used in pre-training LLMs. Results show that our best-estimated traces closely match the actual traces, with an F1 score of around 90%. Additionally, pruning the complex system-level test code reduces the LLM's inference time by up to 34% without any loss in FL performance. Our method achieves equal or higher FL accuracy, requiring over 85% less average inference time per test case and 93% fewer tokens than the latest LLM-guided FL method.

38.5SEMay 8
Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization

Golnaz Gharachorlu, Mahsa Panahandeh, Lionel C. Briand et al.

Software failures remain a major challenge in modern software development, and identifying the code elements responsible for failures is a time-consuming debugging task. While extensive research has focused on fault localization in the system under test (SUT), failures can also originate from faulty system test scripts. This problem, known as Test Code Fault Localization (TCFL), has received significantly less attention despite its importance in continuous integration (CI) environments where large test suites are executed frequently. TCFL is particularly challenging because it typically operates under black-box conditions, relies on limited diagnostic signals such as error messages and partial logs, and involves large system-level test scripts that expand the fault localization search space. In this paper, we propose SPARK, a framework that integrates accumulated debugging knowledge from continuous integration (CI) environments into Large Language Model (LLM)-based TCFL. Given a newly observed failing test case, SPARK retrieves similar fault-labeled test cases from a debugging knowledge corpus and selectively annotates suspicious lines of the failing test based on their similarity to previously observed fault patterns. These annotations guide the LLM's reasoning while maintaining scalability and avoiding the prompt-length explosion common to naive retrieval-augmented approaches. We evaluate SPARK on three industrial datasets containing real-world faulty Python test cases from different software products. The results show that SPARK consistently improves fault localization effectiveness compared to the existing LLM-based TCFL baseline while maintaining comparable inference cost and token usage. In particular, the approach advances the state of the art by identifying more correct faulty locations in complex test cases containing multiple faults.

5.5SYMar 25
Spatial Correlation, Non-Stationarity, and Degrees of Freedom of Holographic Curvature-Reconfigurable Apertures

Liuxun Xue, Shu Sun, Ruifeng Gao et al.

Low-altitude wireless platforms increasingly require lightweight, conformal, and densely sampled antenna array apertures with high array gain and spatial selectivity. However, when deployed on nonplanar surfaces, curvature alters the array manifold, local visibility, and propagation support, potentially invalidating spatial-stationarity assumptions. In this paper, we investigate a holographic curvature-reconfigurable aperture (HoloCuRA), modeled as a curvature-controllable holographic surface, and develop a visibility-aware spatial characterization framework for its low-altitude applications. Specifically, the framework jointly quantifies array-domain spatial non-stationarity (SnS), and spatial degrees of freedom (DoF) in line-of-sight, 3GPP non-line-of-sight, and isotropic-scattering propagation environments. For SnS, a novel Power-balanced, Visibility-aware Correlation-Matrix Distance (PoVi-CMD) and a two-stage subarray-screening procedure are introduced. For DoF, the Rényi-2 effective rank is adopted, and tractable spatial-correlation expressions under isotropic scattering are developed for efficient DoF analysis. Furthermore, a realizable antenna port mode is introduced to connect SnS with DoF. Numerical results reveal that curvature and propagation support are the primary determinants of both SnS and DoF in HoloCuRA: array domain SnS determines whether subarray statistics can be treated as locally consistent, whereas DoF limits the global spatial modes. The findings provide useful guidance for low-altitude antenna-system design.

NIFeb 5, 2025
Channel Gain Map Construction based on Subregional Learning and Prediction

Jiayi Chen, Ruifeng Gao, Jue Wang et al.

The construction of channel gain map (CGM) is essential for realizing environment-aware wireless communications expected in 6G, for which a fundamental problem is how to predict the channel gains at unknown locations effectively by a finite number of measurements. As using a single prediction model is not effective in complex propagation environments, we propose a subregional learning-based CGM construction scheme, with which the entire map is divided into subregions via data-driven clustering, then individual models are constructed and trained for every subregion. In this way, specific propagation feature in each subregion can be better extracted with finite training data. Moreover, we propose to further improve prediction accuracy by uneven subregion sampling, as well as training data reuse around the subregion boundaries. Simulation results validate the effectiveness of the proposed scheme in CGM construction.