Xingya Wang

SE
h-index10
3papers
20citations
Novelty53%
AI Score29

3 Papers

SEMar 13, 2025
Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization

Weisong Sun, Yiran Zhang, Jie Zhu et al.

Commenting code is a crucial activity in software development, as it aids in facilitating future maintenance and updates. To enhance the efficiency of writing comments and reduce developers' workload, researchers has proposed various automated code summarization (ACS) techniques to automatically generate comments/summaries for given code units. However, these ACS techniques primarily focus on generating summaries for code units at the method level. There is a significant lack of research on summarizing higher-level code units, such as file-level and module-level code units, despite the fact that summaries of these higher-level code units are highly useful for quickly gaining a macro-level understanding of software components and architecture. To fill this gap, in this paper, we conduct a systematic study on how to use LLMs for commenting higher-level code units, including file level and module level. These higher-level units are significantly larger than method-level ones, which poses challenges in handling long code inputs within LLM constraints and maintaining efficiency. To address these issues, we explore various summarization strategies for ACS of higher-level code units, which can be divided into three types: full code summarization, reduced code summarization, and hierarchical code summarization. The experimental results suggest that for summarizing file-level code units, using the full code is the most effective approach, with reduced code serving as a cost-efficient alternative. However, for summarizing module-level code units, hierarchical code summarization becomes the most promising strategy. In addition, inspired by the research on method-level ACS, we also investigate using the LLM as an evaluator to evaluate the quality of summaries of higher-level code units. The experimental results demonstrate that the LLM's evaluation results strongly correlate with human evaluations.

MLNov 14, 2024
Counterfactual Uncertainty Quantification of Factual Estimand of Efficacy from Before-and-After Treatment Repeated Measures Randomized Controlled Trials

Xingya Wang, Yang Han, Yushi Liu et al.

This article quantifies the uncertainty reduction achievable for \textit{counterfactual} estimand, and cautions against potential bias when the estimand uses Digital Twins. Posed by Neyman (1923a) who showed unbiased \textit{point estimation} from designed \textit{factual} experiments is possible, \textit{counterfactual} uncertainty quantification (CUQ) remained an open challenge for about one hundred years. The $Rx: C$ \textit{counterfactual} efficacy we focus on is the ideal estimand for comparing treatment $Rx$ with control $C$, the expected outcome differential if each patient received \textit{both} $Rx$ and $C$. Enabled by our new statistical modeling principle called ETZ, we show CUQ is achievable in Randomized Controlled Trials (RCTs) with \textit{Before-and-After} Repeated Measures, common in many therapeutic areas. The CUQ we are able to achieve typically has lower variability than factual UQ. We caution against using predictors with measurement error, which violates regression assumptions and can cause \textit{attenuation} bias in estimating treatment effects. For traditional medicine and population-averaged targeted therapy, counterfactual point estimation remains unbiased. However, in both Real Human and Digital Twin approaches, estimating effects in \emph{subgroups} may suffer attenuation bias.

SEAug 10, 2019
Mutation Testing for Ethereum Smart Contract

Haoran Wu, Xingya Wang, Jiehui Xu et al.

Smart contract is a special program that manages digital assets on blockchain. It is difficult to recover the loss if users make transactions through buggy smart contracts, which cannot be directly fixed. Hence, it is important to ensure the correctness of smart contracts before deploying them. This paper proposes a systematic framework to mutation testing for smart contracts on Ethereum, which is currently the most popular open blockchain for deploying and running smart contracts. Fifteen novel mutation operators have been designed for Ethereum Smart Contracts (ESC), in terms of keyword, global variable/function, variable unit, and error handling. An empirical study on 26 smart contracts in four Ethereum DApps has been conducted to evaluate the effectiveness of mutation testing. The experimental results show that our approach can outperform the coverage-based approach on defect detection rate (96.01% vs. 55.68%). The ESC mutation operators are effective to reveal real defects and we found 117 out of 729 real bug reports are related to our operators. These show the great potential of using mutation testing for quality assurance of ESC.