SEAIMay 23, 2024

exLong: Generating Exceptional Behavior Tests with Large Language Models

arXiv:2405.14619v34 citationsh-index: 29Has CodeICSE
Originality Incremental advance
AI Analysis

This addresses a gap in software testing for developers by automating EBT generation, though it is incremental as it builds on existing LLM and test generation methods.

The paper tackles the problem of automatically generating exceptional behavior tests (EBTs) for software, which developers often neglect in favor of 'happy path' testing, and presents exLong, a framework that outperforms state-of-the-art models and tools, with 23 generated EBTs accepted in open-source projects.

Many popular programming languages, including C#, Java, and Python, support exceptions. Exceptions are thrown during program execution if an unwanted event happens, e.g., a method is invoked with an illegal argument value. Software developers write exceptional behavior tests (EBTs) to check that their code detects unwanted events and throws appropriate exceptions. Prior research studies have shown the importance of EBTs, but those studies also highlighted that developers put most of their efforts on "happy paths", e.g., paths without unwanted events. To help developers fill the gap, we present the first framework, dubbed exLong, that automatically generates EBTs. exLong is a large language model instruction fine-tuned from CodeLlama and embeds reasoning about traces that lead to throw statements, conditional expressions that guard throw statements, and non-exceptional behavior tests that execute similar traces. We compare exLong with the state-of-the-art models for test generation (CAT-LM) and one of the strongest foundation models (GPT-4o), as well as with analysis-based tools for test generation (Randoop and EvoSuite). Our results show that exLong outperforms existing models and tools. Furthermore, we contributed several pull requests to open-source projects and 23 EBTs generated by exLong were already accepted.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes