Pavel Panchekha

PL
6papers
2citations
Novelty59%
AI Score47

6 Papers

73.7PLMar 25
Numerical Superoptimization for Library Learning

Jonas Regehr, Mitch Briles, Zachary Tatlock et al.

Numerical software depends on fast, accurate implementations of mathematical primitives like sin, exp, and log. Modern superoptimizers can optimize floating-point kernels against a given set of such primitives, but a more fundamental question remains open: which new primitives are worth implementing in the first place? We formulate this as numerical library learning: given a workload of floating-point kernels, identify the mathematical primitives whose expert implementations would most improve speed and accuracy. Our key insight is that numerical superoptimizers already have the machinery well-suited to this problem. Their search procedures happen to enumerate candidate primitives, their equivalence procedures can generalize and deduplicate candidates, and their cost models can estimate counterfactual utility: how much the workload would improve if a given primitive were available. We present GrowLibm, which repurposes the Herbie superoptimizer as a numerical library learner. GrowLibm mines candidate primitives from the superoptimizer's intermediate search results, ranks them by counterfactual utility, and prunes redundant candidates. Across three scientific applications (PROJ, CoolProp, and Basilisk), GrowLibm identifies compact, reusable primitives that can be implemented effectively using standard numerical techniques. When Herbie is extended with these expert implementations, kernel speed improves by up to 2.2x at fixed accuracy, and maximum achievable accuracy also improves, in one case from 56.0% to 93.5%. We also prototype an LLVM matcher that recognizes learned primitives in optimized IR, recovering 26 replacement sites across five PROJ projections and improving end-to-end application performance by up to 5%.

NAMar 14, 2025
Mixing Condition Numbers and Oracles for Accurate Floating-point Debugging

Bhargav Kulkarni, Pavel Panchekha

Recent advances have made numeric debugging tools much faster by using double-double oracles, and numeric analysis tools much more accurate by using condition numbers. But these techniques have downsides: double-double oracles have correlated error so miss floating-point errors while condition numbers cannot cleanly handle over- and under- flow. We combine both techniques to avoid these downsides. Our combination, EXPLANIFLOAT, computes condition numbers using double-double arithmetic, which avoids correlated errors. To handle over- and under- flow, it introduces a separate logarithmic oracle. As a result, EXPLANIFLOAT achieves a precision of 80.0% and a recall of 96.1% on a collection of 546 difficult numeric benchmarks: more accurate than double-double oracles yet dramatically faster than arbitrary-precision condition number computations.

50.2PLMar 24
Semantics for 2D Rasterization

Bhargav Kulkarni, Henry Whiting, Pavel Panchekha

Rasterization is the process of determining the color of every pixel drawn by an application. Powerful rasterization libraries like Skia, CoreGraphics, and Direct2D put exceptional effort into drawing, blending, and rendering efficiently. Yet applications are still hindered by the inefficient sequences of operations that they ask these libraries to perform. Even Google Chrome, a highly optimized program co-developed with the Skia rasterization library, still produces inefficient instruction sequences even on the top 100 most visited websites. The underlying reason for this inefficiency is that rasterization libraries have complex semantics and opaque and non-obvious execution models. To address this issue, we introduce $μ$Skia, a formal semantics for the Skia 2D graphics library, and mechanize this semantics in Lean. $μ$Skia covers language and graphics features like canvas state, the layer stack, blending, and color filters, and the semantics itself is split into three strata to separate concerns and enable extensibility. We then identify four patterns of sub-optimal Skia code produced by Google Chrome, and then write replacements for each pattern. $μ$Skia allows us to verify the replacements are correct, including identifying numerous tricky side conditions. We then develop a high-performance Skia optimizer that applies these patterns to speed up rasterization. On 99 Skia programs gathered from the top 100 websites, this optimizer yields a speedup of 18.7% over Skia's most modern GPU backend, while taking at most 32 $μ$s for optimization. The speedups persist across a variety of websites, Skia backends, and GPUs. To provide true, end-to-end verification, optimization traces produced by the optimizer are loaded back into the $μ$Skia semantics and translation validated in Lean.

84.7PLMar 20
Incremental Live Programming via Shortcut Memoization

Marisa Kirisame, Thomas J. Porter, Ruqing Yang et al.

Live programming systems aim to quickly show programmers the dynamic impacts of program edits. To do so, they re-execute the program whenever it is edited, which poses a computational challenge when programs become large or complex. This has led to the need for incrementality in the implementation of live program interpreters. This paper introduces Chordata, an incremental program interpreter based on shortcut memoization, which learns repeated patterns of computation, called shortcuts, by observing executions of previous versions of a program. It can then apply these shortcuts when the same or a structurally similar program fragment is re-executed. This paper contributes a formal semantics of shortcut memoization for any language with a rewrite-based semantics, with mechanized proofs of key correctness properties. We then express a variant of the Hazel live programming system, expressed as a CEK machine, in Chordata, and develop a number of practical heuristics to learn high-value shortcuts. We evaluate the resulting system on editing traces of students solving simple programming problems. Chordata achieves a speedup of 13.03\times compared to baseline with a 19.97\times memory overhead. For smaller changes and for more complex programs, Chordata achieves even greater speedups. Furthermore, we show that Chordata is capable of providing a speedup even within a single execution, with a faster speedup on a larger input.

27.1MSApr 6
Accurate Residues for Floating-Point Debugging

Yumeng He, Pavel Panchekha

Floating-point arithmetic is error-prone and unintuitive. Floating-point debuggers instrument programs to monitor floating-point arithmetic at run time and flag numerical issues. They estimate residues, i.e., the difference between actual floating-point and ideal real values, for every floating-point value in the program. Prior work explores various approaches for computing these residues accurately and efficiently. Unfortunately, the most efficient methods, based on "error-free transformations", have a high rate of false reports, while the most accurate methods, based on high-precision arithmetic, are very slow. This paper builds on error-free-transformations-based approaches and aims to improve their accuracy while preserving efficiency. To more accurately compute residues, this paper divides residue computation into two steps (rounding error computation and residue function evaluation) and shows how to perform each step accurately via careful improvements to the current state of the art. We evaluate on 44 large scientific computing workloads, focusing on the 14 benchmarks where prior tools produce false reports: our approach eliminates false reports on 10 benchmarks and substantially reduces them on the remaining 3 benchmarks. Moreover, complex numerical issues require additional care due to absorption, where two machine-precision residues cannot both be computed accurately in a single execution. This paper introduces residue override, which re-executes the program multiple times, computing different residues in different executions and assembling a final "patchwork" execution. We evaluate on 169 standard benchmarks drawn from numerical analysis papers and textbooks, requiring only 3.6 re-executions on average. Among 34 benchmarks with false reports in the initial run, residue override is triggered on 29 of them and reduces false reports on 25 of them, averaging 7.1 re-executions.

MSJul 12, 2021
Faster Math Functions, Soundly

Ian Briggs, Pavel Panchekha

Standard library implementations of functions like sin and exp optimize for accuracy, not speed, because they are intended for general-purpose use. But applications tolerate inaccuracy from cancellation, rounding error, and singularities-sometimes even very high error-and many application could tolerate error in function implementations as well. This raises an intriguing possibility: speeding up numerical code by tuning standard function implementations. This paper thus introduces OpTuner, an automatic method for selecting the best implementation of mathematical functions at each use site. OpTuner assembles dozens of implementations for the standard mathematical functions from across the speed-accuracy spectrum. OpTuner then uses error Taylor series and integer linear programming to compute optimal assignments of function implementation to use site and presents the user with a speed-accuracy Pareto curve they can use to speed up their code. In a case study on the POV-Ray ray tracer, OpTuner speeds up a critical computation, leading to a whole program speedup of 9% with no change in the program output (whereas human efforts result in slower code and lower-quality output). On a broader study of 37 standard benchmarks, OpTuner matches 216 implementations to 89 use sites and demonstrates speed-ups of 107% for negligible decreases in accuracy and of up to 438% for error-tolerant applications.