Hal Finkel

LG
6papers
59citations
Novelty37%
AI Score20

6 Papers

NASep 7, 2011
An Iterated, Multipoint Differential Transform Method for Numerically Evolving PDE IVPs

Hal Finkel

Traditional numerical techniques for solving time-dependent partial-differential-equation (PDE) initial-value problems (IVPs) store a truncated representation of the function values and some number of their time derivatives at each time step. Although redundant in the dx->0 limit, what if spatial derivatives were also stored? This paper presents an iterated, multipoint differential transform method (IMDTM) for numerically evolving PDE IVPs. Using this scheme, it is demonstrated that stored spatial derivatives can be propagated in an efficient and self-consistent manner; and can effectively contribute to the evolution procedure in a way which can confer several advantages, including aiding solution verification. Lastly, in order to efficiently implement the IMDTM scheme, a generalized finite-difference stencil formula is derived which can take advantage of multiple higher-order spatial derivatives when computing even-higher-order derivatives. As is demonstrated, the performance of these techniques compares favorably to other explicit evolution schemes in terms of speed, memory footprint and accuracy.

LGApr 27, 2021
Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)

Xingfu Wu, Michael Kruse, Prasanna Balaprakash et al.

In this paper, we develop a ytopt autotuning framework that leverages Bayesian optimization to explore the parameter space search and compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We select six of the most complex PolyBench benchmarks and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to optimize them. We then use the autotuning framework to optimize the pragma parameters to improve their performance. The experimental results show that our autotuning approach outperforms the other compiling methods to provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and covariance with two large datasets in 200 code evaluations for effectively searching the parameter spaces with up to 170,368 different configurations. We find that the Floyd-Warshall benchmark did not benefit from autotuning because Polly uses heuristics to optimize the benchmark to make it run much slower. To cope with this issue, we provide some compiler option solutions to improve the performance. Then we present loop autotuning without a user's knowledge using a simple mctree autotuning framework to further improve the performance of the Floyd-Warshall benchmark. We also extend the ytopt autotuning framework to tune a deep learning application.

LGFeb 2, 2021
Report of the Workshop on Program Synthesis for Scientific Computing

Hal Finkel, Ignacio Laguna

Program synthesis is an active research field in academia, national labs, and industry. Yet, work directly applicable to scientific computing, while having some impressive successes, has been limited. This report reviews the relevant areas of program synthesis work for scientific computing, discusses successes to date, and outlines opportunities for future work. This report is the result of the Workshop on Program Synthesis for Scientific Computing was held virtually on August 4-5 2020 (https://prog-synth-science.github.io/2020/).

PFOct 15, 2020
Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization

Xingfu Wu, Michael Kruse, Prasanna Balaprakash et al.

An autotuning is an approach that explores a search space of possible implementations/configurations of a kernel or an application by selecting and evaluating a subset of implementations/configurations on a target platform and/or use models to identify a high performance implementation/configuration. In this paper, we develop an autotuning framework that leverages Bayesian optimization to explore the parameter space search. We select six of the most complex benchmarks from the application domains of the PolyBench benchmarks (syr2k, 3mm, heat-3d, lu, covariance, and Floyd-Warshall) and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to optimize them. We then use the autotuning framework to optimize the pragma parameters to improve their performance. The experimental results show that our autotuning approach outperforms the other compiling methods to provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and covariance with two large datasets in 200 code evaluations for effectively searching the parameter spaces with up to 170,368 different configurations. We compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We find that the Floyd-Warshall benchmark did not benefit from autotuning because Polly uses heuristics to optimize the benchmark to make it run much slower. To cope with this issue, we provide some compiler option solutions to improve the performance.

DCOct 13, 2020
Autotuning Search Space for Loop Transformations

Michael Kruse, Hal Finkel, Xingfu Wu

One of the challenges for optimizing compilers is to predict whether applying an optimization will improve its execution speed. Programmers may override the compiler's profitability heuristic using optimization directives such as pragmas in the source code. Machine learning in the form of autotuning can assist users in finding the best optimizations for each platform. In this paper we propose a loop transformation search space that takes the form of a tree, in contrast to previous approaches that usually use vector spaces to represent loop optimization configurations. We implemented a simple autotuner exploring the search space and applied it to a selected set of PolyBench kernels. While the autotuner is capable of representing every possible sequence of loop transformations and their relations, the results motivate the use of better search strategies such as Monte Carlo tree search to find sophisticated loop transformations such as multilevel tiling.

PLOct 6, 2019
Design and Use of Loop-Transformation Pragmas

Michael Kruse, Hal Finkel

Adding a pragma directive into the source code is arguably easier than rewriting it, for instance for loop unrolling. Moreover, if the application is maintained for multiple platforms, their difference in performance characteristics may require different code transformations. Code transformation directives allow replacing the directives depending on the platform, i.e. separation of code semantics and its performance optimization. In this paper, we explore the design space (syntax and semantics) of adding such directive into a future OpenMP specification. Using a prototype implementation in Clang, we demonstrate the usefulness of such directives on a few benchmarks.