JingZhu He

2papers

2 Papers

SEAug 9, 2023Code
A Comprehensive Empirical Study of Bugs in Open-Source Federated Learning Frameworks

Weijie Shao, Yuyang Gao, Fu Song et al.

Federated learning (FL) is a distributed machine learning (ML) paradigm, allowing multiple clients to collaboratively train shared machine learning (ML) models without exposing clients' data privacy. It has gained substantial popularity in recent years, especially since the enforcement of data protection laws and regulations in many countries. To foster the application of FL, a variety of FL frameworks have been proposed, allowing non-experts to easily train ML models. As a result, understanding bugs in FL frameworks is critical for facilitating the development of better FL frameworks and potentially encouraging the development of bug detection, localization and repair tools. Thus, we conduct the first empirical study to comprehensively collect, taxonomize, and characterize bugs in FL frameworks. Specifically, we manually collect and classify 1,119 bugs from all the 676 closed issues and 514 merged pull requests in 17 popular and representative open-source FL frameworks on GitHub. We propose a classification of those bugs into 12 bug symptoms, 12 root causes, and 18 fix patterns. We also study their correlations and distributions on 23 functionalities. We identify nine major findings from our study, discuss their implications and future research directions based on our findings.

SEOct 8, 2021
TFix+: Self-configuring Hybrid Timeout Bug Fixing for Cloud Systems

Jingzhu He, Ting Dai, Xiaohui Gu

Timeout bugs can cause serious availability and performance issues which are often difficult to fix due to the lack of diagnostic information. Previous work proposed solutions for fixing specific type of timeout-related performance bugs. In this paper, we present TFix+, a self-configuring timeout bug fixing framework for automatically correcting two major kinds of timeout bugs (i.e., misused timeout bugs and missing timeout bugs) with dynamic timeout value predictions. TFix+ provides two new hybrid schemes for fixing misused and missing timeout bugs, respectively. TFix+ further provides prediction-driven timeout variable configuration based on runtime function tracing. We have implemented a prototype of TFix+ and conducted experiments on 16 real world timeout bugs. Our experimental results show that TFix+ can effectively fix 15 out of tested 16 timeout bugs.