Hengcheng Zhu

h-index6

6papers

99citations

Novelty51%

AI Score51

Ranked #17,391 of 194,257 authors (top 9%)#138 in SE (top 5%)

6 Papers

8.5SEApr 11Code

MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases

Congying Xu, Valerio Terragni, Hengcheng Zhu et al.

Metamorphic Testing (MT) alleviates the oracle problem by defining oracles based on metamorphic relations (MRs), that govern multiple related inputs and their outputs. However, designing MRs is challenging, as it requires domain-specific knowledge. This hinders the widespread adoption of MT. We observe that developer-written test cases can embed domain knowledge that encodes MRs. Such encoded MRs could be synthesized for testing not only their original programs but also other programs that share similar functionalities. In this paper, we propose MR-Scout to automatically synthesize MRs from test cases in open-source software (OSS) projects. MR-Scout first discovers MR-encoded test cases (MTCs), and then synthesizes the encoded MRs into parameterized methods (called codified MRs), and filters out MRs that demonstrate poor quality for new test case generation. MR-Scout discovered over 11,000 MTCs from 701 OSS projects. Experimental results show that over 97% of codified MRs are of high quality for automated test case generation, demonstrating the practical applicability of MR-Scout. Furthermore, codified-MRs-based tests effectively enhance the test adequacy of programs with developer-written tests, leading to 13.52% and 9.42% increases in line coverage and mutation score, respectively. Our qualitative study shows that 55.76% to 76.92% of codified MRs are easily comprehensible for developers.

11.8SEApr 11

MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing

Congying Xu, Songqiang Chen, Jiarong Wu et al.

While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enhance test adequacy. In this paper, we propose MR-Adopt (Automatic Deduction Of inPut Transformation) to automatically deduce the input transformation from the hard-coded source and follow-up inputs, aiming to enable the encoded MRs to be reused with new source inputs. With typically only one pair of source and follow-up inputs available in an MR-encoded test case as the example, we leveraged LLMs to understand the intention of the test case and generate additional examples of source-followup input pairs. This helps to guide the generation of input transformations generalizable to multiple source inputs. Besides, to mitigate the issue that LLMs generate erroneous code, we refine LLM-generated transformations by removing MR- irrelevant code elements with data-flow analysis. Finally, we assess candidate transformations based on encoded output relations and select the best transformation as the result. Evaluation results show that MR-Adopt can generate input transformations applicable to all experimental source inputs for 72.00% of encoded MRs, which is 33.33% more than using vanilla GPT-3.5. By incorporating MR- Adopt-generated input transformations, encoded MR-based test cases can effectively enhance the test adequacy, increasing the line coverage and mutation score by 10.62% and 18.91%, respectively.

10.1GRJul 9

Programming-by-Example for Batch-Editing Collision Meshes in 3D Software

Gengyang Xu, Dongwei Xiao, Hengcheng Zhu et al.

As 3D software proliferates, software artifacts now extend beyond code and 2D user interfaces to include 3D assets. Among these assets, collision meshes are critical as they define the geometry used by physics engines for collision detection and physical interaction. Although existing tools can automatically generate collision meshes from visual meshes, they often fail to capture the intended interaction behavior. As a result, developers need to manually edit many heterogeneous collision meshes, a process that is time-consuming and challenging to scale. To address this problem, we present a neuro-symbolic program synthesis approach for batch-editing collision meshes. We formulate the task as a programming-by-example problem: given a family of collision meshes with the same editing intent and a small number of user demonstrations, our approach synthesizes a reusable program that captures the editing intent and applies it to non-demonstration meshes. We implement this in a tool named MeshForge, and evaluate it across 24 tasks on 600 collision meshes. MeshForge successfully synthesizes 23/24 tasks, requiring 2.2 demonstrations and 3.5 seconds of synthesis time on average.

5.3SEJun 10Code

Characterizing Tests in IoT Software: Practices, Challenges and Opportunities

Rufeng Chen, Hengcheng Zhu, Wuqi Zhang et al.

The Internet of Things (IoT) is experiencing rapid growth. Smart devices are emerging in smart homes and industrial applications, performing mission-critical tasks. Bugs in IoT software can lead to severe consequences. For example, a buggy smart lock can allow unauthorized access to a private property. Testing is a primary practice to expose software bugs and ensure software quality. However, little is known about how IoT software is tested. To bridge this gap, we conducted the first empirical study on test cases in open-source IoT software. Specifically, we evaluated the effectiveness of test cases in IoT software, explored the challenges inherent in testing IoT software, and analyzed the usage of mock objects. Our results indicate that while IoT software often contains a considerable number of tests, their effectiveness remains limited. We identified the primary challenges in testing IoT software as managing complex interactions with various external dependencies, such as other network-reliant IoT components, file systems, operating systems, and databases. We also observed that the use of mock objects in IoT software closely aligns with our identified testing challenges. This alignment demonstrates the potential of mocking as a solution to enhance test coverage and address the complexities of IoT software testing.

9.7SEApr 17

MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis

Congying Xu, Hengcheng Zhu, Songqiang Chen et al.

Metamorphic testing (MT) is a widely recognized technique for alleviating the oracle problem in software testing. However, its adoption is hindered by the difficulty of constructing effective metamorphic relations (MRs), which often require domain-specific or hard-to-obtain knowledge. In this work, we propose a novel approach that leverages the functional coupling between methods, which is readily available in source code, to automatically construct MRs and generate metamorphic test cases (MTCs). Our technique, MR-Coupler, identifies functionally coupled method pairs, employs large language models to generate candidate MTCs, and validates them through test amplification and mutation analysis. In particular, we leverage three functional coupling features to avoid expensive enumeration of possible method pairs, and a novel validation mechanism to reduce false alarms. Our evaluation of MR-Coupler on 100 human-written MTCs and 50 real-world bugs shows that it generates valid MTCs for over 90% of tasks, improves valid MTC generation by 64.90%, and reduces false alarms by 36.56% compared to baselines. Furthermore, the MTCs generated by MR-Coupler detect 44% of the real bugs. Our results highlight the effectiveness of leveraging functional coupling for automated MR construction and the potential of MR-Coupler to facilitate the adoption of MT in practice. We also released the tool and experimental data to support future research.

7.1SEApr 21

Improving LLM-Driven Test Generation by Learning from Mocking Information

Jamie Lee, Flynn Teh, Hengcheng Zhu et al.

Large Language Models (LLMs) have recently shown strong potential for automated unit test generation. This has motivated us to investigate whether developer-defined test doubles (commonly referred to as mocks) available in existing test suites can be leveraged to improve LLM-driven test generation. To this end, we propose MOCKMILL, an LLM-based technique and tool that generates test cases by exploiting mocking information automatically extracted from developer-written tests. MOCKMILL targets components that are replaced by test doubles in existing tests and uses the encoded stubbings and interaction expectations to guide test generation, combined with an iterative generation-and-repair process to ensure executable tests. We evaluated MOCKMILL on 10 open-source classes from six Java projects using four LLMs, and compared the generated tests with existing project tests and tests produced by baseline approaches. The results show that MOCKMILL's tests cover lines of code and kill mutants that existing tests and baseline-generated tests miss. Overall, our findings provide preliminary evidence that leveraging mocking information is a complementary and effective way to enhance LLM-based test generation.