Tao Xiao

SE
h-index22
10papers
104citations
Novelty36%
AI Score51

10 Papers

SEMay 28
Cross-Project Flakiness: A Case Study of the OpenStack Ecosystem

Tao Xiao, Dong Wang, Shane McIntosh et al.

Automated regression testing is a cornerstone of modern software development, often contributing directly to code review and Continuous Integration (CI). Yet some tests suffer from flakiness, where their outcomes vary non-deterministically. Flakiness erodes developer trust in test results, wastes computational resources, and undermines CI reliability. While prior research has examined test flakiness within individual projects, its broader ecosystem-wide impact remains largely unexplored. In this paper, we present an empirical study of test flakiness in the OpenStack ecosystem, which focuses on (1) cross-project flakiness, where flaky tests impact multiple projects, and (2) inconsistent flakiness, where a test exhibits flakiness in some projects but remains stable in others. By analyzing 649 OpenStack projects, we identify 1,535 cross-project flaky tests and 1,105 inconsistently flaky tests. We find that cross-project flakiness affects 55% of OpenStack projects and significantly increases both review time and computational costs. Surprisingly, 70% of unit tests exhibit cross-project flakiness, challenging the assumption that unit tests are inherently insulated from issues that span modules like integration and system-level tests. Through qualitative analysis, we observe that race conditions in CI, inconsistent build configurations, and dependency mismatches are the primary causes of inconsistent flakiness. These findings underline the need for better coordination across complex ecosystems, standardized CI configurations, and improved test isolation strategies.

SEApr 5Code
Self-Admitted GenAI Usage in Open-Source Software

Tao Xiao, Youmei Fan, Fabio Calefato et al.

Strategized LaTeX removal and whitespace normalization approachThe widespread adoption of generative AI (GenAI) tools such as GitHub Copilot and ChatGPT is transforming software development. Since generated source code is virtually impossible to distinguish from manually written code, their real-world usage and impact on open-source software (OSS) development remain poorly understood. In this paper, we introduce the concept of self-admitted GenAI usage, that is, developers explicitly referring to the use of GenAI tools for content creation in software artifacts. Using this concept as a lens to study how GenAI tools are integrated into OSS projects, we analyze a curated sample of more than 200,000 GitHub repositories, identifying 1,292 such self-admissions across 156 repositories in commit messages, code comments, and project documentation. Using a mixed methods approach, we derive a taxonomy of 32 tasks, 10 content types, and 11 purposes associated with GenAI usage based on 1,292 qualitatively coded mentions. We then analyze 13 documents with policies and usage guidelines for GenAI tools and conduct a developer survey to uncover the ethical, legal, and practical concerns behind them. Our findings reveal that developers actively manage how GenAI is used in their projects, highlighting the need for project-level transparency, attribution, and quality control practices in AI-assisted software development. Finally, we examine the longitudinal impact of GenAI adoption on code churn in 151 repositories with self-admitted GenAI usage and find no general increase, contradicting popular narratives on the impact of GenAI on software development.

CVApr 28, 2023Code
Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening

Mingsong Li, Yikun Liu, Tao Xiao et al.

Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at https://github.com/lms-07/LGTEUN.

SEMay 21Code
"Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution

Zhao Tian, Zifan Zhang, Tao Xiao et al.

Recent advances in coding agents have shown remarkable progress in software issue resolution. In practice, real-world issues are typically bug fixes or feature requests in which human developers naturally incorporate refactoring as part of the resolution process, resulting in tangled refactoring. Since LLMs are trained on large-scale open-source repositories, coding agents may inherit such behaviors. In this paper, we conduct an empirical study on Multi-SWE-bench, analyzing 3,691 valid patches generated by three agent frameworks with 12 LLMs. We find that coding agents introduce tangled refactorings less frequently (21.43% vs. 36.72%) and with lower intensity (0.66 vs. 1.75) than human developers, although they exhibit a broader diversity of refactoring types. Logistic regression analysis further shows that tangled refactorings are strongly associated with reduced compilability, while exhibiting no significant association with functional correctness. Based on these findings, we propose a refactoring-aware refinement approach that assesses the necessity and safety of tangled refactorings and selectively removes or repairs problematic operations. Our approach improves compilability from 19.34% to 38.33%, and additionally resolves 2.79% previously unresolved issues. Overall, this work presents the first step towards understanding tangled refactoring practices in agentic issue resolution and opens up avenues for future work.

AIJul 31, 2024Code
Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification

Junru Chen, Tianyu Cao, Jing Xu et al.

Time Series Classification (TSC) encompasses two settings: classifying entire sequences or classifying segmented subsequences. The raw time series for segmented TSC usually contain Multiple classes with Varying Duration of each class (MVD). Therefore, the characteristics of MVD pose unique challenges for segmented TSC, yet have been largely overlooked by existing works. Specifically, there exists a natural temporal dependency between consecutive instances (segments) to be classified within MVD. However, mainstream TSC models rely on the assumption of independent and identically distributed (i.i.d.), focusing on independently modeling each segment. Additionally, annotators with varying expertise may provide inconsistent boundary labels, leading to unstable performance of noise-free TSC models. To address these challenges, we first formally demonstrate that valuable contextual information enhances the discriminative power of classification instances. Leveraging the contextual priors of MVD at both the data and label levels, we propose a novel consistency learning framework Con4m, which effectively utilizes contextual information more conducive to discriminating consecutive segments in segmented TSC tasks, while harmonizing inconsistent boundary labels for training. Extensive experiments across multiple datasets validate the effectiveness of Con4m in handling segmented TSC tasks on MVD. The source code is available at https://github.com/MrNobodyCali/Con4m.

CVMar 4, 2024Code
Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection

Xin Zhang, Tao Xiao, Gepeng Ji et al.

Camouflage poses challenges in distinguishing a static target, whereas any movement of the target can break this disguise. Existing video camouflaged object detection (VCOD) approaches take noisy motion estimation as input or model motion implicitly, restricting detection performance in complex dynamic scenes. In this paper, we propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly using a frozen pre-trained optical flow fundamental model. EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation. Interactions across the dual streams are realized in an interactive prompting way that is inspired by emerging visual prompt learning. Two learnable modules, i.e., the camouflaged feeder and motion collector, are designed to incorporate segmentation-to-motion and motion-to-segmentation prompts, respectively, and enhance outputs of the both streams. The prompt fed to the motion stream is learned by supervising optical flow in a self-supervised manner. Furthermore, we show that long-term historical information can also be incorporated as a prompt into EMIP and achieve more robust results with temporal consistency. Experimental results demonstrate that our EMIP achieves new state-of-the-art records on popular VCOD benchmarks. Our code is made publicly available at https://github.com/zhangxin06/EMIP.

SEFeb 11, 2022Code
GitHub Sponsors: Exploring a New Way to Contribute to Open Source

Naomichi Shimada, Tao Xiao, Hideaki Hata et al.

GitHub Sponsors, launched in 2019, enables donations to individual open source software (OSS) developers. Financial support for OSS maintainers and developers is a major issue in terms of sustaining OSS projects, and the ability to donate to individuals is expected to support the sustainability of developers, projects, and community. In this work, we conducted a mixed-methods study of GitHub Sponsors, including quantitative and qualitative analyses, to understand the characteristics of developers who are likely to receive donations and what developers think about donations to individuals. We found that: (1) sponsored developers are more active than non-sponsored developers, (2) the possibility to receive donations is related to whether there is someone in their community who is donating, and (3) developers are sponsoring as a new way to contribute to OSS. Our findings are the first step towards data-informed guidance for using GitHub Sponsors, opening up avenues for future work on this new way of financially sustaining the OSS community.

SEAug 18, 2021Code
More Than React: Investigating The Role of EmojiReaction in GitHub Pull Requests

Teyon Son, Tao Xiao, Dong Wang et al.

Context: Open source software development has become more social and collaborative, especially with the rise of social coding platforms like GitHub. Since 2016, GitHub started to support more informal methods such as emoji reactions, with the goal to reduce commenting noise when reviewing any code changes to a repository. Interestingly, preliminary results indicate that emojis do not always reduce commenting noise (i.e., eight out of 20 emoji reactions), providing evidence that developers use emojis with ulterior intentions. From a reviewing context, the extent to which emoji reactions facilitate for a more efficient review process is unknown. Objective: In this registered report, we introduce the study protocols to investigate ulterior intentions and usages of emoji reactions, apart from reducing commenting noise during the discussions in GitHub pull requests (PRs). As part of the report, we first perform a preliminary analysis to whether emoji reactions can reduce commenting noise in PRs and then introduce the execution plan for the study. Method: We will use a mixed-methods approach in this study, i.e., quantitative and qualitative, with three hypotheses to test.

AISep 10, 2021
Boosting Graph Search with Attention Network for Solving the General Orienteering Problem

Zongtao Liu, Jing Xu, Jintao Su et al.

Recently, several studies have explored the use of neural network to solve different routing problems, which is an auspicious direction. These studies usually design an encoder-decoder based framework that uses encoder embeddings of nodes and the problem-specific context to produce node sequence(path), and further optimize the produced result on top by beam search. However, existing models can only support node coordinates as input, ignore the self-referential property of the studied routing problems, and lack the consideration about the low reliability in the initial stage of node selection, thus are hard to be applied in real-world. In this paper, we take the orienteering problem as an example to tackle these limitations. We propose a novel combination of a variant beam search algorithm and a learned heuristic for solving the general orienteering problem. We acquire the heuristic with an attention network that takes the distances among nodes as input, and learn it via a reinforcement learning framework. The empirical studies show that our method can surpass a wide range of baselines and achieve results close to the optimal or highly specialized approach. Also, our proposed framework can be easily applied to other routing problems. Our code is publicly available.

SEFeb 19, 2021
Characterizing and Mitigating Self-Admitted Technical Debt in Build Systems

Tao Xiao, Dong Wang, Shane McIntosh et al.

Technical Debt is a metaphor used to describe the situation in which long-term software artifact quality is traded for short-term goals in software projects. In recent years, the concept of self-admitted technical debt (SATD) was proposed, which focuses on debt that is intentionally introduced and described by developers. Although prior work has made important observations about admitted technical debt in source code, little is known about SATD in build systems. In this paper, we set out to better understand the characteristics of SATD in build systems. To do so, through a qualitative analysis of 500 SATD comments in the Maven build system of 291 projects, we characterize SATD by location and rationale (reason and purpose). Our results show that limitations in tools and libraries, and complexities of dependency management are the most frequent causes, accounting for 50% and 24% of the comments. We also find that developers often document SATD as issues to be fixed later. As a first step towards the automatic detection of SATD rationale, we train classifiers to detect the two most frequently occurring reasons and the four most frequently occurring purposes of SATD in the content of comments in Maven build systems. The classifier performance is promising, achieving an F1-score of 0.71-0.79. Finally, within 16 identified 'ready-to-be-addressed' SATD instances, the three SATD submitted by pull requests and the five SATD submitted by issue reports were resolved after developers were made aware. Our work presents the first step towards understanding technical debt in build systems and opens up avenues for future work, such as tool support to track and manage SATD backlogs.