SE LGDec 24, 2025

Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors

Hridya Dhulipala, Xiaokai Rong, Tien N. Nguyen

arXiv:2512.21431v13.4h-index: 2

Originality Incremental advance

AI Analysis

This addresses the need for static detection of runtime errors in software development, such as in online code snippets, but is incremental as it builds on existing LLM-based and coverage-guided approaches.

The paper tackles the problem of detecting runtime errors in code snippets without execution, proposing Cerberus, a framework that uses LLMs for input generation and coverage prediction, resulting in more efficient high-coverage test cases and discovery of more runtime errors compared to conventional and learning-based methods.

In several software development scenarios, it is desirable to detect runtime errors and exceptions in code snippets without actual execution. A typical example is to detect runtime exceptions in online code snippets before integrating them into a codebase. In this paper, we propose Cerberus, a novel predictive, execution-free coverage-guided testing framework. Cerberus uses LLMs to generate the inputs that trigger runtime errors and to perform code coverage prediction and error detection without code execution. With a two-phase feedback loop, Cerberus first aims to both increasing code coverage and detecting runtime errors, then shifts to focus only detecting runtime errors when the coverage reaches 100% or its maximum, enabling it to perform better than prompting the LLMs for both purposes. Our empirical evaluation demonstrates that Cerberus performs better than conventional and learning-based testing frameworks for (in)complete code snippets by generating high-coverage test cases more efficiently, leading to the discovery of more runtime errors.

View on arXiv PDF

Similar