AIMay 27

STAB: Specification-driven Testing for Algorithmic Bottlenecks

Soohan Lim, Joonghyuk Hahn, Hyundong Jin, Yo-Sub Han

arXiv:2605.2798178.4h-index: 3Has Code

Predicted impact top 38% in AI · last 90 daysOriginality Highly original

AI Analysis

For developers and researchers evaluating algorithmic code efficiency, STAB provides a method to generate worst-case test cases from specifications alone, improving bottleneck detection.

STAB generates test cases that expose algorithmic bottlenecks from natural-language problem specifications, raising the rate of bottleneck-exposing test cases from 50.43% to 73.45% on open-source LLMs and from 57.45% to 71.85% on closed-source LLMs across Python, Java, and C++.

Evaluating the efficiency of algorithmic code requires test cases that expose runtime bottlenecks. Previous methods generate efficiency test cases either by increasing input size or by generating code-specific inputs that make the given implementation run slowly. Consequently, they do not address the structural input conditions that drive the algorithmic worst case. We introduce STAB, a specification-driven pipeline that generates test cases that expose algorithmic bottlenecks from a natural-language problem specification alone. STAB separates the task into constraint-bound maximization and adversarial structure injection. (i) The constraint saturator extracts constraints and resolves large admissible size assignments using rule-based saturation and CP-SAT optimization over related variables. (ii) The adversarial scenario injector retrieves implementation-level adversarial construction principles from a curated scenario catalog using keyword matching and K-nearest neighbors (KNN). STAB encodes the problem specification, resolved boundary, and retrieved construction principles into a structured generation specification, from which the LLM synthesizes a Python test case generator. On CodeContests, STAB raises the rate of generated test cases that expose algorithmic bottlenecks from 50.43% to 73.45% on average across open-source LLMs and from 57.45% to 71.85% on average across closed-source LLMs, with consistent gains across Python, Java, and C++. Our code is available at https://github.com/suhanmen/STAB.

View on arXiv PDF Code

Similar