RECURSUM: Automated Code Generation for Recurrence Relations Exceeds Expert Optimization via LayeredCodegen
For scientists and engineers using recurrence relations in fields like quantum chemistry and numerical analysis, this work eliminates the need for dual expertise in domain knowledge and C++ metaprogramming, enabling automated generation that exceeds manual optimization.
RECURSUM is a Python DSL that generates optimized C++ for recurrence relations, achieving 9.8x speedup over expert hand-written code and 1.9x over template metaprogramming for McMurchie-Davidson Hermite coefficients, with speedups propagating to complete algorithms within 3.3% of expert baselines.
Automated code generation can systematically exceed expert hand-optimization for recurrence relations-computational primitives ubiquitous in orthogonal polynomials, special functions, numerical integration, and molecular integral evaluation. We present RECURSUM, a Python-based domain-specific language generating optimized C++ for arbitrary recurrence relations via three backends: template metaprogramming for compile-time evaluation, a novel LayeredCodegen backend with architectural optimizations, and runtime loop-based evaluation. The DSL uses einsum-inspired notation to specify recurrences, validity constraints, and base cases in 10-30 lines of Python, generating 650+ lines of production C++. LayeredCodegen achieves 9.8x speedup over expert hand-written implementations and 1.9x over template metaprogramming for McMurchie-Davidson Hermite coefficients. Architecture analysis reveals three quantifiable effects: (1) zero-copy output parameters eliminate return-by-value overhead (70-80% of speedup), (2) guaranteed function inlining eliminates compiler-refused overhead (15-20%), (3) exact-sized stack buffers achieve 100% cache efficiency vs 27% for MAX-sized arrays (5-10%). We validate on 24 recurrence types spanning pure mathematics (Legendre, Chebyshev, Hermite, Laguerre polynomials), numerical analysis (Clenshaw, Golub-Welsch), and quantum chemistry (McMurchie-Davidson, Rys quadrature, Boys function). Production benchmarks show speedups propagate to complete algorithms, with generated code matching expert baselines within 3.3%. RECURSUM demonstrates that systematic code generation serves as the performance ceiling for recurrence algorithms. By eliminating the dual expertise barrier (domain knowledge + C++ metaprogramming), the framework democratizes high-performance scientific computing-establishing a paradigm where automated generation systematically exceeds manual optimization.