PL SEOct 4, 2016

Mathematical Execution: A Unified Approach for Testing Numerical Code

arXiv:1610.01133v16.64 citations

Originality Highly original

AI Analysis

This addresses the challenge of automated testing for numerical code, which is incremental as it builds on existing testing methods but offers a novel optimization-based framework.

The paper tackles the problem of testing numerical code by introducing Mathematical Execution (ME), a unified approach that transforms testing into a minimization problem solved via optimization, and demonstrates that their implementation CoverMe achieves branch coverage improvements from 43% to 91% compared to existing tools while reducing time from 6058.4 to 6.9 seconds on average.

This paper presents Mathematical Execution (ME), a new, unified approach for testing numerical code. The key idea is to (1) capture the desired testing objective via a representing function and (2) transform the automated testing problem to the minimization problem of the representing function. The minimization problem is to be solved via mathematical optimization. The main feature of ME is that it directs input space exploration by only executing the representing function, thus avoiding static or symbolic reasoning about the program semantics, which is particularly challenging for numerical code. To illustrate this feature, we develop an ME-based algorithm for coverage-based testing of numerical code. We also show the potential of applying and adapting ME to other related problems, including path reachability testing, boundary value analysis, and satisfiability checking. To demonstrate ME's practical benefits, we have implemented CoverMe, a proof-of-concept realization for branch coverage based testing, and evaluated it on Sun's C math library (used in, for example, Android, Matlab, Java and JavaScript). We have compared CoverMe with random testing and Austin, a publicly available branch coverage based testing tool that supports numerical code (Austin combines symbolic execution and search-based heuristics). Our experimental results show that CoverMe achieves near-optimal and substantially higher coverage ratios than random testing on all tested programs, across all evaluated coverage metrics. Compared with Austin, CoverMe improves branch coverage from 43% to 91%, with significantly less time (6.9 vs. 6058.4 seconds on average).

View on arXiv PDF

Similar