SEAIMar 23

LLM-Based Test Case Generation in DBMS through Monte Carlo Tree Search

arXiv:2603.2153077.0h-index: 7
AI Analysis

This addresses the need for automated, adaptable testing in DBMSs, particularly for proprietary SQL dialects, though it is incremental as it builds on existing LLM and search methods.

The paper tackled the problem of generating high-quality SQL test cases for Database Management Systems (DBMSs) using lightweight Large Language Models (LLMs), which often produce invalid or semantically similar queries. It proposed MIST, a framework combining feature-guided error-driven synthesis and Monte Carlo Tree Search-based mutation, achieving average improvements of 43.3% in line coverage, 32.3% in function coverage, and 46.4% in branch coverage compared to baselines.

Database Management Systems (DBMSs) are fundamental infrastructure for modern data-driven applications, where thorough testing with high-quality SQL test cases is essential for ensuring system reliability. Traditional approaches such as fuzzing can be effective for specific DBMSs, but adapting them to different proprietary dialects requires substantial manual effort. Large Language Models (LLMs) present promising opportunities for automated SQL test generation, but face critical challenges in industrial environments. First, lightweight models are widely used in organizations due to security and privacy constraints, but they struggle to generate syntactically valid queries for proprietary SQL dialects. Second, LLM-generated queries are often semantically similar and exercise only shallow execution paths, thereby quickly reaching a coverage plateau. To address these challenges, we propose MIST, an LLM-based test case generatIon framework for DBMS through Monte Carlo Tree search. MIST consists of two stages: Feature-Guided Error-Driven Test Case Synthetization, which constructs a hierarchical feature tree and uses error feedback to guide LLM generation, aiming to produce syntactically valid and semantically diverse queries for different DBMS dialects, and Monte Carlo Tree Search-Based Test Case Mutation, which jointly optimizes seed query selection and mutation rule application guided by coverage feedback, aiming at boosting code coverage by exploring deeper execution paths. Experiments on three widely-used DBMSs with four lightweight LLMs show that MIST achieves average improvements of 43.3% in line coverage, 32.3% in function coverage, and 46.4% in branch coverage compared to the baseline approach with the highest line coverage of 69.3% in the Optimizer module.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes