3 Papers

MSDec 10, 2012
Performance Modeling for Dense Linear Algebra

Elmar Peise, Paolo Bientinesi

It is well known that the behavior of dense linear algebra algorithms is greatly influenced by factors like target architecture, underlying libraries and even problem size; because of this, the accurate prediction of their performance is a real challenge. In this article, we are not interested in creating accurate models for a given algorithm, but in correctly ranking a set of equivalent algorithms according to their performance. Aware of the hierarchical structure of dense linear algebra routines, we approach the problem by developing a framework for the automatic generation of statistical performance models for BLAS and LAPACK libraries. This allows us to obtain predictions through evaluating and combining such models. We demonstrate that our approach is successful in both single- and multi-core environments, not only in the ranking of algorithms but also in tuning their parameters.

MSSep 26, 2012
High-Performance Solvers for Dense Hermitian Eigenproblems

Matthias Petschow, Elmar Peise, Paolo Bientinesi

We introduce a new collection of solvers - subsequently called EleMRRR - for large-scale dense Hermitian eigenproblems. EleMRRR solves various types of problems: generalized, standard, and tridiagonal eigenproblems. Among these, the last is of particular importance as it is a solver on its own right, as well as the computational kernel for the first two; we present a fast and scalable tridiagonal solver based on the Algorithm of Multiple Relatively Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers, PMRRR is part of the freely available Elemental library, and is designed to fully support both message-passing (MPI) and multithreading parallelism (SMP). As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's solvers on two supercomputers. Such a study, performed with up to 8,192 cores, provides precise guidelines to assemble the fastest solver within the ScaLAPACK framework; it also indicates that EleMRRR outperforms even the fastest solvers built from ScaLAPACK's components.

PFApr 29, 2015Code
The ELAPS Framework: Experimental Linear Algebra Performance Studies

Elmar Peise, Paolo Bientinesi

Optimal use of computing resources requires extensive coding, tuning and benchmarking. To boost developer productivity in these time consuming tasks, we introduce the Experimental Linear Algebra Performance Studies framework (ELAPS), a multi-platform open source environment for fast yet powerful performance experimentation with dense linear algebra kernels, algorithms, and libraries. ELAPS allows users to construct experiments to investigate how performance and efficiency vary depending on factors such as caching, algorithmic parameters, problem size, and parallelism. Experiments are designed either through Python scripts or a specialized GUI, and run on the whole spectrum of architectures, ranging from laptops to clusters, accelerators, and supercomputers. The resulting experiment reports provide various metrics and statistics that can be analyzed both numerically and visually. We demonstrate the use of ELAPS in four concrete application scenarios and in as many computing environments, illustrating its practical value in supporting critical performance decisions.